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Part  I 


Query  Mediation 


Chapter  1 

Introduction 


SRI  International  has  completed  a  research  program  that  produced  a  new  and  unique  approach  to 
the  trusted  interoperation  of  autonomous  heterogeneous  databases  containing  data  that  mismatch 
in  semantics,  representations,  and  security  policies. 


1.1  Problem 

The  interoperation  of  heterogeneous  databases  is  a  pressing  need  today  as  organizations  attempt  to 
share  data  stored  in  legacy  databases.  These  databases  are  independently  developed  and  maintained 
to  each  serve  the  needs  of  a  single  organization.  The  exchange  of  data  between  such  databases 
could  be  problematic  not  only  because  of  differences  in  the  representation  (syntax)  of  data  but 
also  because  of  often  subtle  differences  in  the  intended  interpretation  (semantics)  of  data.  Thus, 
although  translators  could  be  constructed  to  reformat  data  from  one  representation  to  another,  such 
a  translation  does  not  guarantee  that  the  combined,  translated  data  are  meaningful  we  could 
be  attempting  to  compare  apples  with  oranges.  When  we  try  to  interoperate  multilevel  secure 
databases  having  different  security  semantics  (e.g.,  element-level  vs.  tuple-level  labeling,  treatment 
of  polyinstantiation)  with  system-high  legacy  databases,  these  same  problems  arise  in  dealing  with 
the  syntax  and  semantics  of  security. 

Currently  there  is  no  technology  that  adequately  addresses  these  problems.  Any  potential 
solution  must  address  the  critical  issues  described  here. 

1.1.1  Semantic  Mismatch 

Heterogeneity  in  the  semantics  of  data  arises  naturally.  The  semantic  differences  are  caused  by 
the  diverse  needs  of  applications.  Moreover,  the  relationships  between  heterogeneous  data  could 
be  incomplete  or  uncertain.  Examples  of  semantic  mismatches  are 

•  Scope.  Tests  for  the  database  in  a  hospital  laboratory  includes  only  single  tests  (e.g.,  sodium), 
whereas  for  the  database  in  an  insurance  company  they  also  include  panels  (e.g.,  electrolyte 
panel),  which  are  collections  of  tests. 
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•  Temporal  Basis.  The  database  in  a  clinic  records  the  transaction  time  for  outpatients’ 
visits,  while  the  database  in  a  hospital  records  the  admission  and  release  time  for  inpatients. 

Direct  comparison  and  combination  of  data  with  such  semantic  mismatches  would  be  mean¬ 
ingless.  The  interoperation  should  be  semantically  meaningful,  in  terms  of  both  the  semantics  of 
individual  databases  and  the  semantic  relationships  between  them. 

1.1.2  Representational  Mismatch 

In  addition  to  semantic  mismatches,  the  same  data  could  be  represented  in  various  incompatible 
structures,  and  the  same  structure  could  be  used  to  represent  data  with  incompatible  semantics 
[50].  The  representational  differences  are  caused  by  the  need  to  bind  data  to  representations  that 
are  most  natural  and  efficient  with  respect  to  specific  applications.  In  general,  there  simply  does 
not  exist  a  universal  representation  that  is  perfect  for  every  application  [5,  15,  34],  Examples  of 
representational  mismatches  are 

•  Identification.  Patients  could  be  identified  by  patient  id  numbers  in  the  hospital,  but  by 
social  security  numbers  (SSNs)  in  the  insurance  company.  The  nature  of  operations  in  these 
two  organizations  demands  that  different  identifiers  be  used,  since  patient  data  are  most  likely 
accessed  by  SSNs,  not  patient  id  numbers,  in  the  insurance  company. 

•  Biased  View.  The  relationship  between  patients,  the  drugs  they  are  allergic  to,  and  the 
description  of  the  symptoms  could  be  represented  as  one  relation,  or  in  two  relations  where 
the  second  relation  captures  physicians’  notes.  It  is  impractical  to  represent  the  relationship 
in  all  possible  structures. 

1.1.3  Security  Mismatch 

The  interoperation  of  heterogeneous  military  and  civilian  databases  could  be  further  complicated 
by  the  fact  that  data  must  be  protected  from  untrusted  access.  The  security  policies  of  these 
databases  could  mismatch  in  many  ways,  examples  of  which  are 

•  Label.  The  same  data  is  classified  as  TOP-SECRET  in  one  database,  but  as  TOP-SECRET 
OUTER-SPACE  in  another. 

•  Security  Representation.  One  database  employs  element-level  classification,  while  another 
chooses  tuple-level  classification. 

•  Security  Semantics.  Classifying  attribute  CARGO  means  in  one  database  that  unautho¬ 
rized  users  should  not  know  about  which  flights  carry  what  cargos,  but  means  in  another  that 
unauthorized  users  should  not  know  about  what  cargos  are  shipped  to  which  destinations. 

Without  trusted  semantic  interoperation,  either  data  in  isolated  military  and  civilian  databases 
will  remain  inaccessible,  or  users  will  run  into  the  risk  of  unauthorized  access  to  their  data  through 
inference  channels. 
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1.1.4  Autonomy 

Because  of  the  diverse  needs  of  autonomous  organizations,  heterogeneity  will  persist  rather  than 
disappear.  To  support  the  interoperation  of  autonomous  heterogeneous  databases  containing  data 
with  semantic,  representational,  and  security  policy  mismatches,  five  critical  issues  must  be  ad¬ 
dressed: 

•  Semantics.  Interoperation  should  be  semantically  meaningful,  in  terms  of  both  the  semantics 
of  individual  databases  and  their  semantic  relationships. 

•  Autonomy.  Database  autonomy  should  be  respected  and  preserved.  Users  should  not  be 
required  to  switch  to  new  query  languages  or  new  schemas  to  access  data  in  multiple  databases. 

•  Automation.  Interoperation  should  be  automated.  Users  should  not  be  required  to  manually 
resolve  all  the  semantic  and  representational  mismatches  to  access  data  in  multiple  databases. 

•  Efficiency.  Automated  interoperation  should  be  computationally  efficient.  In  particular,  it 
should  not  require  expensive  mechanisms  such  as  theorem-proving  in  higher-order  logics. 

•  Security.  Any  access  permitted  within  an  individual  database  must  also  be  permitted  under 
trusted  interoperation.  Any  access  not  permitted  within  an  individual  database  must  be  also 
denied  under  trusted  interoperation. 

1.2  Overview  of  this  Part 

We  present  a  query  mediation  approach  to  the  interoperation  of  autonomous  heterogeneous 
databases  containing  data  with  semantic  and  representational  mismatches  [38].  We  develop  a  me¬ 
diation  architecture  of  interoperation  that  facilitates  query  mediation,  and  formalize  the  semantics 
of  query  mediation.  Queries  are  mediated  between  multiple  databases,  and  users  or  applications 
of  a  local  database  access  data  in  multiple  databases  using  the  local  language  and  schema,  making 
both  the  data  and  the  applications  accessing  the  data  in  legacy  databases  interoperable.  Queries 
are  automatically  mediated,  relieving  users  from  the  difficult  task  of  resolving  semantic  and  repre¬ 
sentational  mismatches.  Semantic  heterogeneity  is  separated  from  representational  heterogeneity 
by  representation  minimization  techniques,  reducing  the  space  of  heterogeneity,  and  improving  the 
efficiency  of  automated  query  mediation. 
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Chapter  2 

The  Database  Interoperation 
Problem:  An  Example 


Imagine  a  typical  database  user,  U.  User  U  knows  a  great  deal  about  the  information  stored  in 
her  database,  including  the  details  of  how  it  is  stored.  If  the  database  is  relational,  then  U  is 
familiar  with  the  schemes  of  the  relations,  understands  the  semantics  of  the  attributes  that  appear 
in  those  schemes,  and  knows  the  details  of  the  way  values  of  those  attributes  are  represented.  U  also 
knows  how  to  write  tolerably  efficient  queries  in  the  appropriate  query  language — say,  SQL — for 
retrieving  data  of  interest  to  her.  But  U' s  knowledge  of  databases  is  limited  to  the  contents  of 
her  local  database  and  how  to  deal  with  it.  There  may  be  other  databases  containing  data  that 
are  quite  relevant  to  her  interests,  but  she  is  most  likely  unaware  of  their  existence,  and  almost 
certainly  lacks  the  skills  required  to  retrieve  and  interpret  that  data. 

The  goal  of  our  research  is  to  help  that  typical  user  U  retrieve  nonlocal  data,  without  requiring 
her  to  acquire  any  additional  knowledge  or  skills.  In  fact,  we  aim  to  change  her  way  of  working  as 
little  as  possible.  An  example,  which  we  will  build  upon  throughout  this  report,  should  help  make 
this  point  clearer. 

Suppose  that  our  database  user  U  is  a  physician  working  at  a  medical  clinic.  Recently,  she  has 
observed  several  unusual  allergic  reactions  to  an  experimental  drug,  XD2001,  being  tested  by  the 
clinic.  The  reactions  are  unusual  in  that  the  patients  who  experienced  them  had  been  using  the  drug 
for  a  considerable  length  of  time  without  a  reaction  prior  to  experiencing  the  recent  symptoms. 
Her  local  database  Ac  contains  information  about  patients’  allergic  reactions,  so  she  decides  to 
search  for  other  cases  of  unusual  reactions  to  XD2001  that  were  caused  by  treatments  this  year. 
Three  relations  in  the  database  are  relevant.  The  relation  PATIENTS-ALLERGY  records  incidents  of 
allergic  reactions  to  drugs,  the  relation  NOTES  holds  any  notes  entered  by  the  attending  physician, 
including  whether  an  allergic  reaction  was  unusual  or  unexpected,  and  the  relation  PATIENTS 
records  the  time  at  which  the  patient  was  treated,  from  which  the  time  of  the  allergic  reaction  can 
be  roughly  inferred.1  The  schemes  of  those  relations  are 

'These  relation  schemes  were  derived  from  the  THelper-II  database,  which  was  developed  by  Stanford  Medical 
School.  Note  in  particular  that  not  including  TRANSACTION-TIME  as  part  of  the  key  for  PATIENTS  is  not  a 
typographical  error.  Evidently,  the  intention  is  to  use  a  unique  id  for  each  visit,  so  as  to  enhance  confidentiality  by 
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PATIENTS  II  PATIENTJD  I  TRANS  ACTION. TIME 


PATIENT-ALLERGY  II  PATIENT-ID  I  DRUG-NAME  I  NOTEJD  START-TIME 


NOTES  II  NOTE-ID  I  TEXT 


where  the  meaning  of  the  attributes  should  be  largely  self-explanatory.  (Boldface  type  indicates  a 
primary  key  for  each  relation.)  An  SQL  query  Qc  on  database  Ac  that  will  retrieve  the  desired 
information  is 


SELECT  PATIENT. ALLERGY. PATIENT. ID,  TEXT 

FROM  PATIENT.ALLERGY ,  PATIENTS,  NOTES 

WHERE  PATIENT. ALLERGY. PATIENT. ID  =  PATIENTS .PATIENT.ID 

AND  PATIENT. ALLERGY. NOTE. ID  =  NOTES. NOTE.ID 

AND  DRUG.NAME  =  ’XD2001’ 

AND  TIMESTAMP  ’1994-01-01  08:00:00’  <  TRANSACTION.TIME ; 


Unknown  to  our  physician  U ,  a  research  hospital  that  is  also  testing  the  drug  has  observed 
similar  unusual  reactions  and  recorded  the  fact  in  its  database  AH.  Of  course,  AH  is  organized 
somewhat  differently  than  Ac,  uses  somewhat  different  value  representations,  and  contains  some¬ 
what  different  information  about  the  events,  so  query  Qc  could  not  be  used  to  retrieve  the  relevant 
data  from  AH.  Rather  than  storing  physicians’  notes  about  allergies  in  a  separate  relation,  the 
DRUG-ALLERGY  relation  in  AH  contains  a  text  field.  Also,  even  less  precise  time  information  is 
available  in  AH.  Like  Ac,  AH  was  designed  on  the  assumption  that  the  allergy  data  needed  no  time 
associated  with  it:  the  standard  use  of  DRUG -ALLERGY  is  to  check  whether  a  patient  is  known 
to  be  allergic  to  penicillin,  or  sulfa  drugs,  or  whatever,  when  designing  a  course  of  treatment,  not 
to  record  information  discovered  about  allergies  during  treatment.  So,  just  as  in  Ac,  the  time  of 
the  observed  reaction  must  be  inferred.  Rather  than  the  relatively  precise  TRANSACTION-TIME 
stored  in  Ac,  the  only  time  information  in  AH  is  the  admission  and  release  times  stored  in  the 
admission  records  of  the  ADMISSIONS  relation.  Thus,  the  relevant  schemes  in  AH  are 


II  PATIENTJD 

|  ADMISSION-TIME  | 

|  PATIENT-NAME 

LjJ 

DRUG  .ALLERGY  II  PATIENTJD  |  DRUGJD  |  TEXT 


Our  physician  U  would  be  just  as  much  interested  in  seeing  cases  where  patients  admitted  to 
the  hospital  this  year — for  anyone  admitted  this  year  was  certainly  treated  this  year — exhibited  an 
allergic  reaction  to  XD2001,  which  is  referred  to  as  EXPERIMENTAL-DRUG-2001  in  Ah.  The  query 
Qh  that  U  would  like  to  issue  on  AH,  if  only  she  had  the  required  knowledge,  is 


SELECT  DRUG. ALLERGY. PATIENT. ID,  TEXT 
FROM  DRUG.ALLERGY,  ADMISSIONS 

WHERE  DRUG. ALLERGY. PATIENT. ID  =  ADMISSIONS. PATIENT.ID 
AND  DRUG.ID  =  ’EXPERIMENTAL.DRUG.2001 ’ 

AND  TIMESTAMP  ’1994-01-01  08:00:00’  <  ADMISSION.TIME; 

making  it  more  difficult  to  discover  the  identity  of  a  patient  via  analysis  of  data  associated  with  a  patient  id. 
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Although  there  are  many  problems  associated  with  database  interoperability,  by  the  database 
interoperability  problem,  we  mean  the  problem  faced  by  U.  A  solution  is  to  provide  U  with  the 
relevant  data  from  AK  when  she  issues  the  query  Qc  on  Ac,  without  requiring  that  she  learn 
anything  about  the  semantics  or  representation  of  data  in  AH,  or  even  requiring  that  she  know 
that  Ah  exists  and  contains  relevant  data.  Our  approach  to  this  solution  is  to  provide  automated 
support  for  resolving  the  semantic  and  representational  mismatches — in  our  example,  the  mismatch 
in  relation  schemes,  the  mismatch  in  value  representations,  and  the  mismatch  in  time  semantics — 
between  Ac  and  Aa. 
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Chapter  3 

Approaches  to  Interoperation 


A  database  system  bridges  the  gap  between  a  data  source  and  an  application  that  accesses  the 
data.  The  typical  organization  of  a  database  system  is  shown  in  Figure  3.1(a),  where  the  data  are 
stored  in  a  database,  the  database  is  managed  by  a  database  management  system  (DBMS),  and  the 
DBMS  supports  a  query  language  and  schema  through  which  the  application  formulates  queries  to 
access  the  data. 

The  ultimate  goal  of  the  interoperation  of  autonomous  heterogeneous  databases  is  for  multiple 
applications  to  share  multiple  data  sources.  There  are  two  aspects  of  such  sharing: 

1.  multiple  applications  sharing  the  same  data  source,  and 

2.  multiple  data  sources  accessed  by  the  same  application. 

The  centralized  database  approach  achieves  data  sharing  by  requiring  that  data  be  stored  in 
the  same  database,  and  that  applications  speak  the  same  language  and  schema,  as  is  shown  in 
Figure  3.1(b).  In  terms  of  our  scenario,  clinic  data  (e.g.,  transaction  time)  and  hospital  data  (e.g., 
admission  time)  must  reside  in  the  same  database,  and  the  physician  at  the  clinic  must  understand 
the  relationship  between  the  two  notions  of  time  in  order  to  access  both  data  sources. 

In  the  distributed  database  approach,  applications  still  must  speak  the  same  language  and 
schema,  but  data  can  be  stored  in  different  databases,  as  long  as  the  databases  are  managed  by 
the  same  DBMS,  as  is  shown  in  Figure  3.1(c).  In  terms  of  our  scenario,  clinic  data  and  hospital 
data  can  reside  in  two  databases,  but  the  two  databases  must  be  managed  by  the  same  DBMS, 
and  the  physician  at  the  clinic  still  must  understand  the  relationship  between  transaction  time  and 
admission  time  in  order  to  access  both  data  sources. 

In  the  federated  database  approach,  applications  still  must  speak  the  same  language  (federated 
language)  and  schema  (federated  schema),  but  databases  can  be  managed  by  different  DBMSs,  as 
is  shown  in  Figure  3.1(d).  In  terms  of  our  scenario,  clinic  data  and  hospital  data  can  reside  in  two 
databases  and  be  managed  by  two  DBMSs,  but  the  physician  at  the  clinic  still  must  understand 
the  relationship  between  transaction  time  and  admission  time  in  order  to  access  both  data  sources. 

Naturally,  the  next  step  in  the  evolution  is  a  mediated  database  approach,  where  applications 
can  speak  different  languages  and  schemas,  and  databases  can  be  managed  by  different  DBMSs,  as 
is  shown  in  Figure  3.1(e).  A  mediator  has  knowledge  about  different  languages  and  schemas,  and 
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the  relationships  between  them.  When  an  application  issues  a  query  in  one  language  and  schema, 
the  mediator  transforms  the  query  into  other  languages  and  schemas  using  its  knowledge,  thus 
enabling  the  application  to  access  multiple  databases  without  having  to  speak  multiple  languages 
and  schemas.  In  terms  of  our  scenario,  the  clinic  and  the  hospital  can  have  two  separate  database 
systems,  and  the  mediator  knows  that  transaction  time  is  somewhere  between  admission  and  release 
times.  The  physician  at  the  clinic  formulates  queries  in  the  language  and  schema  of  the  clinic 
database,  which  are  transformed  by  the  mediator  to  access  both  data  sources.  She  does  not  have 
to  know  that  the  hospital  database  uses  a  different  notion  of  time. 

The  biggest  advantage  of  the  mediator  approach  over  other  approaches  is  in  supporting  the 
interoperation  of  legacy  databases.  Users  of  a  legacy  database  can  access  multiple  databases  without 
having  to  learn  new  languages,  new  schemas,  or  their  relationships  and  differences.  Applications 
built  on  top  of  a  legacy  database  can  access  multiple  databases  without  having  to  be  recoded  with 
queries  in  new  languages  and  schemas.  Thus,  the  mediator  approach  provides  a  seamless  migration 
path  for  legacy  databases,  enabling  organizations  to  leverage  off  investments  in  legacy  data  and 
legacy  applications. 
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Application  Application  Application  Application  Application 


Data  Data  Data  Data  Data 


(a)  DB  Organization  (b)  Centralized  Approach  (c)  Distributed  Approach 


Application  Application  Application  Application 


Data  Data  Data  Data 

(d)  Federated  Approach  (e)  Mediated  Approach 


Figure  3.1:  Approaches  to  Interoperation 
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Chapter  4 

Mediation  Architecture 


The  history  of  database  evolution  as  depicted  in  Figure  3.1  demonstrates  that  data  sharing  does 
not  necessarily  mandate  the  sharing  of  system  components  [34]  (e.g.,  databases,  DBMSs,  schemas). 
As  long  as  autonomous  heterogeneous  databases  could  communicate  with  one  another,  they  could 
benefit  from  each  other’s  data  without  having  to  be  bound  to  a  common  system  component.  Hence, 
a  mediation  architecture  should  be  a  communication  architecture. 

With  the  advance  in  semantic  data  models  and  knowledge  base  systems,  data  processing  has 
evolved  into  intelligent  information  processing,  where  the  availability  of  semantics  and  knowledge 
about  data  greatly  enhances  the  capability  of  information  abstraction.  A  similar  evolution  from 
data  communication  into  intelligent  information  communication  is  essential  for  the  interoperation  of 
autonomous  heterogeneous  databases.  Data  should  not  be  communicated  as  raw  bits  (i.e.,  syntactic 
communication).  Instead,  they  should  be  mediated  (i.e.,  semantic  communication)  to  ensure  that 
data  from  the  sender  will  be  correctly  understood  for  processing  by  the  receiver.  Hence,  a  mediation 
architecture  should  be  a  mediator-assisted  communication  architecture. 


4.1  Components 

A  mediator  consists  of  the  following  components  that  together  support  the  interoperation  of  au¬ 
tonomous  heterogeneous  databases: 

•  Mediation  Language.  Communication  between  autonomous  heterogeneous  databases  must 
be  carried  out  in  a  mediation  language  or  interlingua.  As  pointed  out  in  [34],  this  type 
of  language  differs  from  the  representation  languages  (i.e.,  data  models)  of  participating 
databases.  A  representation  language  captures  the  knowledge  about  data  for  the  appropriate 
abstraction  and  efficient  representation  of  one  class  of  applications,  while  a  mediation  language 
captures  the  knowledge  about  data  for  the  meaningful  and  efficient  communication  between 
many  classes  of  applications. 

•  Knowledge  Base.  Meaningful  communication  between  autonomous  heterogeneous 
databases  is  based  on  the  relationships  between  participating  databases.  These  relation- 
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ships  capture  the  commonalities  and  mismatches  in  semantics  or  representations  between 
these  databases.  They  are  expressed  in  the  mediation  language  and  form  a  knowledge  base. 

•  Query  Tranformer.  A  mediation  language  alone  is  not  sufficient  to  ensure  meaningful  com¬ 
munication,  because  autonomous  heterogeneous  databases  might  contain  data  that  mismatch 
in  semantics  or  representations.  We  need  a  query  transformer  to  mediate  the  communica¬ 
tion  by  resolving  potential  mismatches.  Equipped  with  the  knowledge  base  of  relationships 
between  participating  databases,  the  query  transformer  accepts  queries  from  one  database, 
determines  which  other  databases  contain  relevant  data,  generates  queries  to  these  databases, 
and  mediates  resulting  data  back  to  the  original  database.  The  mediation  is  carried  out  in 
the  mediation  language. 

•  Wrappers.  Participating  databases  are  wrapped  up  by  interface  modules  to  forward  incom¬ 
ing  queries  to  the  query  transformer,  to  respond  to  queries  from  the  query  transformer,  and 
to  receive  answers  from  the  query  transformer.  Wrappers  are  largely  invisible  to  users,  and 
require  very  little  change  to  the  interfaces  of  participating  databases. 

•  Translators.  Since  the  representation  languages  of  participating  databases  very  likely  differ 
from  the  mediation  language,  we  need  translators  to  translate  queries  and  data  between 
these  representation  languages  and  the  mediation  language,  in  order  for  queries  and  data  to 
be  communicable  by  the  query  transformer. 

•  Conflict  Detectors.  When  related  data  from  multiple  participating  databases  are  merged 
to  give  answers  to  a  query,  conflicts  are  always  possible  because  the  merged  data  might 
be  inconsistent  with  respect  to  the  constraints  of  the  original  database  in  which  the  query  is 
specified.  We  need  conflict  detectors  to  detect  such  potential  problems.  The  conflict  detectors 
support  the  communication  of  globally  inconsistent  but  locally  reasonable  data  [34]. 

Figure  4.1  shows  the  mediation  architecture  (arrows  represent  dataflow),  with  three  autonomous 
heterogeneous  databases  interoperating  through  a  mediator  —  a  data  bus. 

4.2  How  Queries  Are  Mediated 

Suppose  that  users  issue  query  Qa  in  database  A  in  Figure  4.1,  expressed  in  the  language/schema 
of  database  A.  The  mediation  of  query  Qa  proceeds  as  fofiows: 

1.  Wrapper  A  intercepts  query  Qa,  and  sends  it  both  to  Query  Processor  A  to  get  answer  La 
and  to  Translator  A  for  mediation. 

2.  Translator  A  translates  query  Qa  to  query  Q'A  expressed  in  the  mediation  language,  and 
sends  it  to  the  Query  Transformer. 

3.  From  query  Q'A,  the  Query  Transformer  computes  a  mediated  query  Q'B  expressed  in  the 
mediation  language  baaed  on  its  knowledge  about  the  relationships  between  databases  A  and 
B,  and  sends  it  to  Translator  B. 
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Query  Data  Conflict? 
I  A  A 


Query  Data  Conflict? 


4.  Translator  B  translates  query  Q'B  to  query  Qb  expressed  in  the  language/schema  of  database 
B,  and  sends  it  to  Wrapper  B. 

5.  Wrapper  B  sends  query  Qb  to  Query  Processor  B  to  get  answer  Db  expressed  in  the  lan¬ 
guage/schema  of  database  B,  and  sends  it  to  Translator  B. 

6.  Translator  B  translates  answer  Db  to  answer  D'B  expressed  in  the  mediation  language,  and 
sends  it  to  the  Query  Transformer. 

7.  The  Query  Transformer  derives  answer  D A  expressed  in  the  mediation  language  from  answer 
D'b  based  on  its  knowledge  about  the  relationships  between  databases  A  and  B,  and  sends 
it  to  Translator  A. 

8.  Translator  A  translates  answer  D'A  to  answer  DA  expressed  in  the  language/schema  of  data¬ 
base  A,  and  sends  it  to  Wrapper  A. 
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9.  Wrapper  A  merges  answers  La  and  Dai  detects  conflicts  in  the  merged  answer,  and  presents 
it  to  users  as  the  answer  to  query  Qa- 

4.3  Discussion 

The  most  important  objective  in  designing  the  mediation  architecture  is  to  support  the  interoper¬ 
ation  of  legacy  databases  and  legacy  applications.  Does  the  architecture  of  Figure  4.1  achieve  this 
objective?  The  answer  is  yes.  A  legacy  database  (e.g.,  the  dotted  boxes  in  Figure  4.1)  could  be 
made  interoperable  by  wrapping  it  up  with  a  translator,  which  makes  the  database  talk  in  the  me¬ 
diation  language,  and  a  conflict  detector,  which  gives  users  the  option  of  being  notified  of  potential 
problems  in  query  mediation.  The  users  and  applications  accessing  data  in  the  legacy  database 
become  capable  of  accessing  data  in  multiple  databases  without  having  to  switch  first  to  a  new 
language  or  new  schema. 

Even  though  supporting  the  interoperation  of  legacy  databases  and  legacy  applications  is  the 
primary  benefit  of  the  mediation  architecture  in  Figure  4.1,  it  does  not  exclude  the  incorporation 
of  new  databases  or  new  schemas,  when  old  schemas  in  legacy  databases  do  not  meet  the  need  of 
a  new  application.  A  participating  database  could  be  a  virtual  one  containing  only  a  schema  but 
no  data,  serving  purely  as  an  interface  to  autonomous  heterogeneous  databases  (e.g.,  the  one  on 
the  right  in  Figure  4.1).  For  example,  an  application  designer  could  define  his  favorite  schema,  and 
specify  some  relationships  of  his  schema  with  other  participating  databases.  From  then  on,  users  of 
the  application  could  formulate  queries  in  this  schema,  and  get  meaningful  access  to  related  data 
in  other  databases  through  query  mediation. 

Although  we  assume  that  the  mediator’s  knowledge  is  given,  this  knowledge  does  not  have  to 
be  complete.  The  more  knowledge  the  mediator  has,  the  more  data  it  could  help  communicate. 
In  other  words,  a  participating  database  does  not  have  to  be  completely  definable  as  a  view  on 
other  participating  databases.  For  example,  suppose  that  two  databases  represent  allergy  and  test 
data  differently.  If  the  mediator  has  knowledge  about  how  the  allergy  data  in  two  databases  are 
related,  then  users  could  query  one  database  and  access  allergy  data  in  both  databases  through 
query  mediation.  If,  on  the  other  hand,  the  mediator  does  not  have  knowledge  about  how  the 
test  data  in  two  databases  are  related,  then  users  could  not  access  test  data  in  both  databases  by 
querying  only  one  database. 

We  emphasize  that  our  architecture  accommodates  the  federation  architecture  [51]  as  a  special 
case.  For  example,  the  virtual  database  on  the  right  in  Figure  4.1  could  be  considered  as  a  federated 
schema.  If  the  federated  schema  is  constructed  from  the  mediator’s  knowledge  base  by  removing 
semantic  and  representational  discrepancies  and  redundancies,  and  queries  are  mediated  only  in 
the  direction  from  the  federated  schema  to  databases  A  and  B,  then  we  get  a  federated  database 
in  which  all  queries  go  through  the  federated  schema  to  access  data  in  both  databases  A  and  B, 
as  shown  in  Figure  4.2  (arrows  represent  data  flow).  Comparing  Figures  4.1  and  4.2,  we  observe 
that  data  could  flow  from  DB  B  to  Wrapper  A  in  Figure  4.1,  because  queries  to  database  B  are 
mediated  to  database  A  to  access  related  data  in  database  A.  Such  data  flow  is  not  possible  in 
Figure  4.2,  because  queries  have  to  be  issued  to  the  federated  interface  in  order  to  access  data  in 
both  databases  A  and  B. 
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Figure  4.2:  Federation  Architecture 

We  also  emphasize  that  the  mediator’s  knowledge  base,  together  with  all  the  participating 
schemas,  should  not  be  equated  to  a  global  or  federated  schema,  for  the  following  reasons: 

1.  The  mediator’s  knowledge  base  is  at  most  a  very  poor  schema,  because  it  contains  semantic 
and  representational  discrepancies  and  redundancies. 

2.  The  semantic  and  representational  discrepancies  and  redundancies  in  the  mediator’s  knowl¬ 
edge  base  are  not  removed,  since  the  removal  would  violate  autonomy  and  would  have  high 
complexity. 

3.  The  mediator’s  knowledge  base  is  not  the  schema  with  which  users  interact. 

Point  (2)  above  shows  a  big  advantage  of  our  architecture  over  the  federation  architecture  in 
terms  of  automation:  users  or  database  designers  only  need  to  identify,  but  do  not  have  to  resolve, 
the  semantic  and  representational  mismatches  in  order  to  access  data  in  multiple  databases,  thus 
removing  a  big  hurdle  to  automation. 

Point  (3)  above  shows  another  important  advantage  of  our  architecture  over  the  federation 
architecture  in  terms  of  autonomy:  users  of  a  local  database  access  data  in  multiple  databases 
through  the  local  language  and  schema  instead  of  a  federated  schema  or  a  multidatabase  language. 
This  is  especially  appealing  for  legacy  databases:  both  the  data  and  the  applications  accessing  the 
data  are  interoperable. 

In  general,  mediators  are  knowledge  base  systems.  Since  it  is  unrealistic  to  expect  a  single, 
general-purpose  mediator  with  optimal  power  [5],  multiple  mediators  should  coexist  (just  like  the 
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coexistence  of  multiple  federated  schemas  in  the  federated  database  approach),  offering  information 
communication  services  at  various  levels  [34,  60].  These  mediators  could  differ  in  their  trade-offs 
between  communication  cost  and  capability  (bandwidth),  and  users  would  subscribe  to  the  services 
that  are  optimal  for  their  applications. 
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Chapter  5 

Query  Mediation 


We  choose  first-order  logic  as  our  mediation  language.  As  we  can  see  in  this  chapter,  query  medi¬ 
ation  becomes  logical  inference  in  first-order  logic,  and  often  in  an  even  more  efficient  fragment  of 
first-order  logic,  such  as  the  fragment  of  Horn  clauses.  Here,  we  focus  on  the  query  transformer, 
since,  for  relational  databases,  the  translation  between  relational  query  languages  (e.g.,  SQL)  and 
first-order  logic  is  reasonably  straightforward:  see  Chapter  6  for  an  example.  The  translation  be¬ 
tween  object  query  languages  (e.g.,  XSQL  [24])  and  first-order  logic  is  discussed  elsewhere  [40]. 
In  the  rest  of  this  chapter,  schemas,  databases,  and  queries  are  all  formulated  in  our  mediation 
language. 

5.1  Schemas  and  Databases 

Intuitively,  a  database  represents  a  perception  (called  the  perceived  world  [35]  or  the  model  world 
[48])  of  the  real  world.  Data  in  a  database  represents  the  knowledge  of  truth  values  of  statements 
about  the  real  world.  A  schema  specifies  the  vocabulary  in  which  data  are  expressed,  and  the 
invariant  properties  of  data.  It  also  supplies  a  context  within  which  queries  could  be  expressed 
meaningfully. 

Formally,  a  dependency  is  a  sentence  in  first-order  logic  of  the  form 

(Vxi)(Vx2) . .  .(Vxm).  pi  A  p2  A  •  ••  A  pk  D  (3yi)(3y2)  .  ■  .  (3yn)(qi  A  q2  A  •  •  •  A  qi) 

where  m,  n  >  0,  p,  is  an  atomic  formula  for  1  <  i  <  k,  and  qj  is  either  an  atomic  formula  or  an 
equality  (when  n  =  0  and  l  =  1)  for  1  <  j  <  l.  A  dependency  is  equality  generating  if  n  =  0,  /  =  1, 
and  qi  is  an  equality.  A  dependency  is  tuple  generating  if  qj  is  an  atomic  formula  for  1  <  j  <  l  [12]. 

A  schema  5  is  a  theory  (V,  C)  in  first-order  logic,  where  V  is  a  vocabulary  of  predicate  sym¬ 
bols  called  relation  schemes,  arguments  of  relation  schemes  are  called  attributes,  and  C  is  a  set 
of  equality-generating  or  tuple-generating  dependencies  expressed  in  V  and  called  integrity  con¬ 
straints. 

A  database  A  over  S  is  a  structure  over  V,  consisting  of  a  nonempty  domain  D  and,  for  every  n- 
ary  predicate  symbol  in  V,  an  assignment  of  that  predicate  symbol  to  a  mapping  from  its  attributes 
into  D.  Database  A  is  valid  if  it  is  a  model  of  S.  Given  two  databases  A\  and  A2  over  schemas 
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S\  —  (Vi,Ci)  and  S2  =  (V2,C2),  respectively,  such  that  V\  fl  V2  =  0,  A\  U  A 2  is  the  database  over 
(Fi  U  V2,  Ci  U  C2 )  that  assigns  the  same  value  to  every  predicate  P  in  VJ  as  Ai  does,  for  i  6  {1, 2}. 

For  our  scenario  in  Chapter  2,  the  schema  of  the  clinic  database  consists  of  three  predicate 
symbols,  Patients,  Patient -Allergy,  and  Notes,  together  with  integrity  constraints  such  as 

(Vz)(Vy)(Vz).  Patients(x,  y)  A  Patients(x,  z)  D  y  =  z 

(Vx)(Vy)(Vz)(Vu)(Vu).  Patient -Allergy (2,  y,  z,  u,  v)  D  (3u>)  Patients(x,  w) 

A  conjunctive  query  q  on  S  is  a  conjunction  of  atomic  formulas  over  V  with  a  (possibly  empty) 
list  of  free  variables.  A  logical  query  q  on  S  is  a  disjunction  of  conjunctive  queries  on  S.  Given  a  data¬ 
base  A  over  S  with  domain  D  and  a  logical  query  q  with  free  variables  Xi,X2, . .  ■ ,  xm,  the  answers  of 
q  in  A  are  the  m-tuples  (di, d2, . . . , dm)  in  Dm  such  that  q  is  true  in  A  when  variables  xi,x2,.  ■  .,xm 
are  assigned  the  values  di,d2,...,dm,  that  is,  such  that  D  }=  q[di/xi,  ^2/^21  •  •  •  ,dm/xm].  For  our 
scenario  in  Chapter  2,  the  SQL  query  Qc  on  the  clinic  database  is  equivalent  to  a  conjunctive  query, 
Qc- 

(3y)(3z)(3w)(3v) . 

Patients(z,  y) 

A  Patient-Allergy (z ,  xd2001,  z  ,w,  v) 

A  Notes(z,  u) 

A  1994/01/01/08/00/00  <  y). 

We  consider  the  interoperation  of  valid,  autonomous,  and  heterogeneous  databases  A,-  with 
domains  Dt  and  over  schemas  5,  =  (V,  Ci)  for  1  <  i  <  n,  where  Vi  fl  Vj  =  0  for  1  <  i  <  j  <  n.  We 
assume  that  Ai  is  empty  if  the  i-th  database  is  virtual.  The  mediator’s  knowledge  base  consists  of 
a  theory  S  =  (U”=i  V*  u  ^  C)  in  first-order  logic,  where  C  is  a  set  of  tuple-generating  dependencies. 
The  mediator’s  knowledge  captures  the  relationships  among  schemas  Si,  S2,  •  •  • ,  Sn,  which  specify 
how  data  in  databases  Ai,  A2, . . . ,  An  are  related  semantically.  For  our  scenario  in  Chapter  2, 
V  =  0,  and  the  mediator’s  knowledge  includes  sentences  such  as  the  following: 

xd2001  =  experimentaLdrug-2001 

(Vx)(Vy)(Vz). 

Drug_A.llergy (x,  y,  z ) 

D  (3u)(3v)(3w)  (Patient_Allergy(x,  y,  u,  v,  w )  A  Notes(u,  z )) 

(Vx)(Vy)(Vz)(Vw)(Vu) . 

Admissions(i,  y,  z,  w)  A  u  <  y  D  Patients(x,  y)  A  u  <  y 

The  first  formula  means  that  the  mediator  knows  that  xd2001  and  experimentaLdrug-2001  are  the 
same,  so  that  a  reference  to  one  can  be  replaced  by  a  reference  to  the  other  without  changing  the 
meaning  of  a  query.  Similarly,  the  second  formula  makes  an  assertion  about  the  relationship  among 
meanings  of  the  relations  mentioned,  not  about  the  content  of  the  databases.  It  means  that  every 
answer  to  the  antecedent  query 

Drug-Allergy (x,  y,  z) 

should  be  a  valid  answer  to  the  consequent  query 
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(3ii)(3v)(3tZi).  Patient_Allergy(x,  y,  u,  v,  w)  A  Notes(u,  z). 

In  general,  the  relationships  among  schemas  are  not  necessarily  pairwise — there  might  be  rela¬ 
tionships  involving  three  or  more  schemas  (see  Section  6.4  for  an  example). 

5.2  Properties  of  Query  Mediation 

Consider  the  interoperation  of  databases  A{  over  schemas  S,  =  ( Vi,Ci )  for  1  <  i  <  n.  Suppose  that 
the  mediator’s  knowledge  base  consists  of  theory  S  =  (U?=i  K'  G  Given  a  logical  query  q  on 

Si  with  free  variables  x\, x%, . . . ,  xm,  a  mediated  query  p  of  q  is  a  logical  query  on  the  combined 
schema  U"=i  StuS  =  ((J”=i  K  U  V,\J=X  Ct  U  C)  with  the  same  list  of  free  variables.  Notice  that, 
although  q  is  expressed  on  one  schema  Si,  p  could  encompass  multiple  schemas  from  .S'i ,  S2, . . . ,  Sn 
and  the  mediator’s  knowledge  base  S  (see  Section  6.4  for  an  example). 

A  mediated  query  p  is  sound  if  it  logically  implies  the  original  query  using  the  mediator’s 
knowledge.  Intuitively,  soundness  means  that  every  answer  of  the  mediated  query  should  be  a  valid 
answer  of  the  original  query.  This  is  formally  expressed  as  follows: 

UT-i  Ci  U  C  b  (V*i)(V*2)  •  •  .(V*m).  p  D  q. 

Naturally,  the  disjunction  of  sound  mediated  queries  is  also  a  sound  mediated  query.  A  mediated 
query  p  is  trivial  if  it  is  sound  even  when  the  mediator’s  knowledge  base  is  empty,  i.e,  when  C  =  0. 
Intuitively  trivialness  means  that  every  answer  of  the  mediated  query  is  obtainable  by  asking  the 
original  query: 

USU  Ci  (=  (V*i)(V*2)  . .  .(V*m).  p  D  q. 

A  mediated  query  p  is  complete  if  it  is  logically  implied  by  all  possible  mediated  queries  p'  of  q. 
Intuitively,  completeness  means  that  every  valid  answer  of  the  original  query  is  an  answer  of  the 
mediated  query: 

U?=1  Ci  1=  (V®i)(V*2) . .  .(Vxm).  p'  D  P 
for  every  mediated  query  p'  of  q. 

When  the  mediator’s  knowledge  base  is  empty,  any  logical  query  q  is  the  sound,  trivial,  and 
complete  mediated  query  of  itself.  However,  the  sound,  nontrivial,  and  complete  mediated  query 
of  q  does  not  exist.  In  general,  if  a  sound,  nontrivial,  and  complete  mediated  query  exists,  then  it 
is  always  unique  up  to  equivalence.  That  is,  if  p  and  p'  are  two  sound,  nontrivial,  and  complete 
mediated  queries  of  q,  then  they  are  equivalent: 

U?=1  Ci  t=  (Vx!)(Vx2) . .  .(Vzm).  p  =  P'. 

5.3  Meaning  of  Query  Mediation 

Again,  consider  the  interoperation  of  databases  A{  over  schemas  5,-  =  ( V,,Ci )  for  1  <  i  <  n. 
Suppose  that  the  mediator’s  knowledge  base  consists  of  theory  S  =  (U?=i  K'  U  V,C).  Given  a 
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logical  query  g  on  5i,  the  objective  of  query  mediation  is  to  replace  the  evaluation  of  q  in  A\ 
by  the  evaluation  of  the  sound,  nontrivial,  and  complete  mediated  query  p  of  q  in  the  combined 
database  UT=i  The  soundness  of  p  ensures  that  such  replacement  is  meaningful  with  respect 
to  the  constraints  in  Si,  S2,  ■  ■  . ,  Sn  and  the  relationships  in  S.  For  the  scenario  in  Chapter  2,  a 
mediated  query  qH  on  the  hospital  database  can  be  derived  from  logical  query  qc  and  the  mediator’s 
knowledge  in  Section  5.1, 

(3»)(B*)(3u>). 

Admissions^,  y,  z,  w) 

A  Drug_Allergy (a;,  experimentaLdrug-2001,  it) 

A  1994/01/01/08/00/00  <  y 

which  ensures  that  the  constant  xd2001  is  converted  to  experimentaLdrug-2001  before  it  is  com¬ 
pared  to  DRUGJD  values,  among  other  things.  When  translated  to  SQL,  query  qK  above  becomes 
the  SQL  query  QH  of  Chapter  2. 

If  the  mediated  query  p  is  trivial,  then  the  answers  of  p  are  contained  in  the  answers  of  the 
original  query  q ,  since  databases  A\,  A2, . . . ,  An  are  valid  and 

Ur=i  Ci  1=  (Va?i)(V®2)  •  •  .(V*m)» 

Hence  query  mediation  does  not  yield  additional  data.  For  the  scenario  in  Chapter  2,  suppose  that 
we  have  an  additional  relation  MEDICARE-PATIENTS  recording  those  patients  who  are  covered 
by  Medicare.  The  fact  that  every  Medicare  patient  is  a  patient  can  be  captured  by  the  integrity 
constraint 

(Vx).  MedicareJPatients(x)  D  (3y)  Patients(x,  y). 

From  query  qc  and  the  above  constraint,  we  could  derive  the  following  sound  mediated  query  on 
the  clinic  database,  which  is  a  trivial  one  because  its  answers  are  contained  in  the  answers  of  the 
original  query. 

(3j/)(3z)(3u>)(3iJ). 

Patients(x,  y) 

A  Medicare.Patients(x) 

A  Patient_Allergy(x,  xd2001,  z ,  w,  v) 

A  Notes(z,  u) 

A  1994/01/01/08/00/00  <  y. 

The  completeness  of  the  mediated  query  p  ensures  that  all  the  valid  answers  of  the  original 
query  q,  whether  they  are  in  databases  61, . . . ,  6„,  will  be  accessed  by  evaluating  p.  In  the  scenario 
of  Chapter  2,  the  disjunction  of  queries  qc  and  §H  is  a  complete  mediated  query,  ensuring  that  all 
patients  who  had  recent  allergic  reactions  to  drug  XD2001  are  accessed,  whether  they  are  recorded 
in  the  clinic  database  or  in  the  hospital  database. 


5.4  Semantics  of  Query  Mediation 

Consider,  once  again,  the  interoperation  of  databases  A,  over  schemas  Si  —  (L,  C.)  for  1  <  i  <  n. 
Suppose  that  the  mediator’s  knowledge  base  consists  of  theory  5  =  ((J"=i  L  U  V,  C ).  Every  equality- 
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generating  dependency  in  C\,C<i,  •••,(?„  is  a  definite  Horn  clause.  Through  skolemization,  every 
tuple-generating  dependency  in  C\,Ci , . .  .,Cn,  or  C  of  the  form 

(Vxi)(Vz2)--(Varm).  Pi  A--  -  Apt  3  (3t/i)(3y2)  •  •  •  (Byn)(gi  A  q2  A  •  •  •  Aqt) 

could  also  be  transformed  into  l  definite  Horn  clauses: 

qi[Mx1,X2,...,Xm)/yi\l<i<n  *-  Pl,P2,---,Pk 


qi[fi(x  l,x2,  Xm)/yi]i<i<n  <-  Pi,  P2,  .  •  • ,  Pk 

where  /,( x\,. . . ,  xm)  is  a  Skolem  function.  For  every  m- ary  predicate  symbol  P  in  Vi,  V2, . . . ,  Vn, 
or  V,  we  add  a  new  m- ary  predicate  symbol  Pq  and  the  following  definite  Horn  clause: 

P(x1 ,  X2,  ■  ■  ■ ,  xm )  *  Po(xx,  X2,  •  •  • ,  xm). 

A  deductive  database  (with  equality)  [13]  could  be  constructed  by  taking  these  Horn  clauses  as 
the  intensional  database  (IDB).  The  extensional  database  (EDB)  consists  of,  for  every  predicate 
P  in  VltV2,...,Vn,  or  V,  the  new  predicate  P0  whose  extent  is  the  relation  assigned  to  P  by 
A\ ,  A2,  .  .  . ,  An,  Of  A. 

Let  M  be  the  initial  model  of  this  deductive  database  [28].  Also  let  D  be  the  universe  of  M 
that  is  the  set  of  equivalence  classes  of  ground  terms  over  [J"-!  V]  U  V  U  {/1,. . . ,  /„},  and  G  C  D 
be  the  set  of  equivalence  classes  containing  ground  terms  over  U"=1  Vi  U  V.  Given  a  logical  query  q 
on  S\  with  free  variables  xi,X2, . .  .,xm,  the  definite  answers  of  q  in  M  are  the  answers  of  q  in  M 
that  are  in  Gm. 

Given  a  mediated  query  p  of  q  on  the  combined  schema  (J"=1  5,-  U  S,  if  p  is  sound,  then  every 
answer  of  p  in  the  combined  database  (J"=1  A,-  U  A  is  a  definite  answer  of  q  in  M.  If  p  is  trivial, 
then  every  answer  of  p  in  U"=1  A,-  U  A  is  an  answer  of  q  in  Aj  and  hence,  a  definite  answer  of  q  in 
M.  If  p  is  complete,  then  every  definite  answer  of  q  in  M  is  an  answer  of  p  in  (J"=1  A,-  U  A. 

When  the  IDB  of  this  deductive  database  is  bounded  [57],  there  is  a  logical  query  p  not  involving 
IDB  predicates,  such  that  every  answer  of  q  in  M  is  an  answer  of  p  in  M  and  vice  versa.  Hence 
there  is  a  logical  query  p'  on  the  combined  schema  (J"=1  5,  U  S,  such  that  every  definite  answer  of 
q  in  M  is  an  answer  of  p'  in  the  combined  database  (J"=1  A,-  U  A  and  vice  versa.  In  other  words,  p' 
is  the  sound  and  complete  mediated  query  of  q. 

In  general,  we  could  view  query  mediation  as  the  first-order  approximation  of  definite  answers 
in  the  initial  model  of  a  deductive  database,  which  is  formed  by  taking  participating  databases  as 
the  EDB,  and  by  taking  (the  skolemization  of)  the  mediator’s  knowledge  and  the  constraints  in 
participating  schemas  as  the  IDB.  The  more  complete  a  mediated  query  is,  the  closer  its  answers 
are  to  the  definite  answers  of  the  original  query  in  the  initial  model.  The  boundedness  of  the  IDB 
serves  as  a  sufficient  condition  for  the  existence  of  sound  and  complete  mediated  queries. 
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Chapter  6 

A  Prototype  Mediator 


We  have  implemented  a  prototype  mediator,  which  does  SQL  query  mediation  between  Oracle 
databases.  The  system  organization  is  shown  in  Figure  6.1,  where  the  shaded  box  is  to  be  imple¬ 
mented. 


Local  Remote 


Local  Remote 

Schema  Schema 


Figure  6.1:  Prototype  System  Organization 

We  illustrate  the  details  of  our  prototype  implementation  using  the  example  query  mediation  of 
Chapter  2,  from  the  SQL  query  Qc  on  the  clinic  database  Ac  to  the  SQL  query  QH  on  the  hospital 
database  AH. 
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6.1  Wrappers 


The  wrapper  surrounding  the  local  Oracle  database  intercepts  SQL  queries  on  the  local  database 
and  sends  them  to  the  SQL-to-logic  translator.  The  wrapper  surrounding  the  remote  Oracle  data¬ 
base  receives  mediated  SQL  queries  from  the  logic-to-SQL  translator  and  sends  them  to  the  remote 
database. 


6.2  Translators 

We  discuss  the  SQL-to-logic  translator  in  detail.  The  logic-to-SQL  translator  is  obtained  by  re¬ 
versing  the  direction  of  the  SQL-to-logic  translation. 

6.2.1  Step  1:  Parsing 

The  first  step  performed  by  the  SQL-to-logic  translator  is  to  parse  an  SQL  query,  which  results  in 
an  abstract  syntax  tree.  As  part  of  the  parsing  process,  omitted  attribute  qualifiers — such  as  the 
omitted  NOTES  qualifier  on  the  attribute  TEXT  in  query  Qc — are  inferred  and  recorded. 

6.2.2  Step  2:  Representation  Minimization 

A  schema  can  contain  a  great  deal  of  representations  that  are  artifacts  of  the  chosen  data  model. 
Eliminating  as  many  representations  as  possible  reduces  the  potential  representational  mismatches 
of  the  local  schema  with  the  remote  schema,  making  query  mediation  more  efficient. 

To  minimize  representations,  we  perform  lossless  decomposition  of  all  relations  in  a  schema  as 
much  as  possible.  When  the  integrity  constraints  in  the  schema  are  limited  to  key-based  functional 
dependencies,  representation  minimization  is  farely  straightforward.  Suppose  that  relation  R  has 
attributes  Aj,  X2,  ■ . . ,  Xm+n,  where  the  first  m  of  the  m  +  n  attributes  form  the  primary  key.  Let 
Ro  be  the  relation  with  attributes  X\,  X2, . . .,  Xm  that  is  obtained  from  R  by  projecting  onto  those 

key  attributes,  that  is,  R0  =  II xux2,...,xmR-  Similarly,  for  1  <  i  <  n,  let  R{  =  UXux2 . xm,xm+iR- 

Then  R  is  replaced  by  the  set  {Ro,  R\, . . Rn}  in  the  minimized  schema.  The  minimized  schema 
also  contains  the  following  integrity  constraints  for  1  <  i  <  n: 

(Vx1)(Vx2).  ■  .(Vxm)(Vy)(Vz).  Ri(x!,x2, . . .  ,xm,y)  A  Ri(x!,x2, . . .  ,xm,  z)  D  y  =  z 
(Vxi)(Vx2)  •  •  (Vxm)(Vj/).  Ri(xi,x2t.,.,xm,y)  D  Ro(x  i,x2,...,xm) 

An  SQL  query  on  the  original  schema  can  be  transformed  into  an  SQL  query  on  the  minimized 
schema  by  replacing  references  to  attribute  A,-,  where  1  <  i  <  m,  of  R  by  references  to  A,-  of  Rq 
and  by  replacing  references  to  attribute  A m+,-,  where  1  <  i  <  n,  of  R  by  references  to  Am+1  of  A,. 
So  our  original  query  Qc  is  equivalent  to  query  Qc 

SELECT  PATIENT. ALLERGY . PATIENT. ID ,  NOTES.TEXT . TEXT 
FROM  PATIENT. ALLERGY, 

PATIENTS ,  PATIENTS.TRANSACTION.TIME , 

NOTES,  NOTES.TEXT 
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WHERE  PATIENT. ALLERGY. PATIENT. ID  =  PATIENTS. PATIENT.ID 
AND  PATIENT.ALLERGY . NOTE.ID  =  NOTES .NOTE.ID 

AND  PATIENT. ALLERGY. DRUG.NAME  =  ’XD2001’ 

AND  TIMESTAMP  ’1994-01-01  08:00:00’ 

<  PATIENTS_TRANSACTION_TIME.TRANSACTION.TIME 
AND  PATIENTS. PATIENT. ID  =  PATIENTS.TRANSACTION.TIME. PATIENT.ID 

AND  NOTES. NOTE.ID  =  NOTES.TEXT. NOTE.ID; 


on  the  minimized  clinic  schema.  (The  name  “NOTES”  is  now  used  to  refer  to  the  projection  of 
the  original  NOTES  relation  onto  its  primary  key  NOTES.ID,  while  “NOTES.TEXT”  refers  to  the 
result  of  projecting  NOTES  onto  its  primary  key  NOTES JD  together  with  the  attribute  TEXT,  and 
similarly  for  the  other  relations.) 


6.2.3  Step  3:  Translation  to  Logic 

The  last  step  of  the  translator  replaces  the  abstract  syntax  tree  for  the  minimized  query  by  an 
equivalent  logical  query.  Relations  correspond  to  predicates;  attributes  correspond  to  variables1; 
selection  conditions  correspond  to  equalities  and  inequalities  between  variables;  selected  attributes 
correspond  to  free  variables,  and  non-selected  attributes  correspond  to  existentially  quantified  vari¬ 
ables.  The  minimized  SQL  query  Qc  above  is  translated  to  logical  query 


(3  ?pa.noteid) 

(3  ?pa.drug_name) 

(3  ?p.patientJd) 

(3  ?ptt.patient_id) 

(3  ?ptt. transaction-time) 

(3  ?n. noteid) 

(3  ?nt.note_id). 

Patient_Allergy(?pa.patientid,  ?pa. drug-name,  ?pa.note_id) 

A  Patients(?p. patient_id) 

A  Patients.Transaction_Time(?ptt. patient Jd,  7ptt.transaction.time) 
A  Notes(?n. note_id) 

A  Notes.Text(?nt. noteid,  ?nt.text) 

A  ?pa. patient-id  =  ?p.patientJd 
A  ?pa.noteid  =  ?n.noteJd 
A  7pa.drug.name  =  xd2001 
A  11994/01/01/08/00/00  <  ?ptt. transaction-time 
A  ?p. patient-id  =  ?ptt.patientid 
A  ?n. note-id  =  ?nt. note-id 


We  use  qualified  attribute  names  prefixed  by  *?’  as  variable  names,  to  improve  readability.  However,  we  will 
abbreviate  the  qualifiers— writing,  for  example,  “?p. patient  Jd”  rather  than  “7patient.patient.id”— to  reduce  the 
length  of  argument  lists.  In  our  examples,  no  ambiguity  results  from  such  abbreviation. 
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6.3  Query  Transformer 

The  query  transformer  derives  a  logical  query  on  the  remote  database  from  a  logical  query  on  the 
local  database.  The  result  of  the  query  transformer  is  passed  on  to  the  logic-to-SQL  translator. 


6.3.1  Step  1:  Logical  Simplification 

When  getting  from  the  translator  a  logical  query  on  some  minimized  schema,  the  query  transformer 
first  uses  its  knowledge  of  the  integrity  constraints  in  the  minimized  schema  to  simplify  the  query. 
Using  the  integrity  constraints  in  Section  6.2.2,  a  logical  query  of  the  form 

Ri(x  %rm  ®m+>) 

A  £0(2/1,  i/2,  •••,  2/m) 

A  Xi  =  2/1 
A  22  =  2/2 
A  ... 

A  xm  =  ym 
Ayk  <  a 
A  ... 


can  be  simplified  to 


£f(®  1,  ®2,  •  •  • ,  ®m,  ®m+i) 

A  Xk  <  a 
A  ... 


The  query  transformer  automatically  generates  all  such  rules  needed  to  eliminate  logical  redundancy 
introduced  by  minimization.  In  addition,  the  maintainer  of  the  mediator’s  knowledge  base  can  add 
simplification  rules  to  eliminate  any  logical  redundancy  in  the  original  schema.  After  simplification, 
the  logical  query  of  Section  6.2.3  becomes  qc, 


(3  ?pa.noteid) 

(3  ?pa.drug_name) 

(3  ?ptt.patientid) 

(3  ?ptt. transaction-time) 

(3  ?nt. noteid). 

Patient_Allergy(?pa. patient  Jd,  ?pa.drug_name,  ?pa. noteid) 

A  Notes_Text(?nt.noteid,  ?nt.text) 

A  Patients.Transaction_Time(?ptt.patientid,  ?ptt. transaction-time) 
A  ?pa.patientid  =  ?ptt.patientid 
A  ?pa. noteid  =  ?nt. noteid 
A  ?pa. drug-name  =  xd2001 
A  11994/1/1/8/0/0/0  <  ?ptt. transaction-time 


6.3.2  Step  2:  Representation  Transformation 

We  come  now  to  the  heart  of  the  mediator,  the  step  where  the  (minimized)  representation  used  by 
the  clinic  database  Ac  is  replaced  by  the  representation  used  by  the  hospital  database  AE.  The 
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informal  semantic  relationships  we  noted  between  the  schema  of  Ac  and  the  schema  of  AH ,  which 
were  formalized  in  Section  5.1  in  terms  of  equality  and  logical  implication,  are  implemented  in  the 
prototype  mediator’s  knowledge  base  by  rewrite  rules.  The  three  rules  relevant  to  our  example  are 

Patients(p)  — »  Admissions(p,  ?a.admissionJtime) 

Patient_Allergy(p,  d,  ni)  A  Notes_Text(n2,  x)  A  ni  =  n2 

— >  Drug_Allergy.Text(p,  d,  x ) 

Patients_Transaction_Time(p,  U)  A  t2  <  h  — *  Admissions(p,  ti)  A  t2  <  ti 

In  addition,  the  query  transformer  has  rules  for  changing  from  the  terminology  used  in  Ac  to 
that  used  in  An .  The  rule  relevant  to  our  example  is 

xd2001  — *■  experimentaLdrug_2001 


Note  that  there  are  no  rules  for  translating  patient  ids  in  Ac  to  patient  ids  in  AH.  Unless  there  is 
reason  to  believe  that  patients  are  consistently  identified  across  databases — the  clinic  and  hospital 
might  both  use  a  patient’s  SSN  as  the  id — logical  queries  containing  particular  patient  ids  cannot 
be  rewritten  from  one  representation  to  the  other:  a  logical  query  such  as 

Patients(md919c) 

where  md919c  is  a  particular  patient  id,  can  be  rewitten  to 

(3  ?a.admission .time) .  Admissions(md919c,  ?a.admissionJtime) 

using  the  first  rule,  but  the  mediator  recognizes  that  this  is  not  a  proper  query  over  AH  because 
md919c  is  not  necessarily  a  term  of  the  language  for  AH. 

Applying  these  rules  to  the  simplified  logical  query  qc  above,  which  defines  a  set  of  answers 
from  Ac,  produces  a  logical  query  qm  that  defines  a  set  of  answers  from  AH: 

(3  ?dat. drug-id) 

(3  ?a. patient-id) 

(3  ?a.admissionJtime). 

Drug-Allergy -Text(?dat.patientid,  ?dat.drug_id,  ?dat.text) 

A  Admissions(?a. patient  id,  ?a.admission_time) 

A  ?dat. patient-id  =  ?a. patient-id 
A  ?dat. drug-id  =  experimentaLdrug-2001 
A  11994/1/1/8/0/0/0  <  ?a. admission-time 

where  we  have  renamed  some  of  the  bound  variables  in  qn  to  make  the  structure  clearer.  Finally,  the 
logic-to-SQL  translator  translates  logical  query  qn  to  an  SQL  query  QH  on  the  minimized  hospital 
schema,  which  is,  in  turn,  transformed  into  QK  by  deminimization. 
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6.4  Multiple  Databases 

Suppose  that  rather  than  a  single  database  AH,  the  hospital  stores  its  information  in  several 
databases.  In  particular,  suppose  that  the  relation  ADMISSIONS  is  part  of  database  A^  and 
that  the  relation  DRUG-ALLERGY  is  part  of  a  different  database  A^\  In  that  case,  the  rules  for 
rewriting  from  Ac  to  A^  could  include 

Patients(p)  — *  Admissions(p,  ?a. admission Jtime) 

Patients_Transaction_Time(p,  ti)  At2  <  ti  — *■  Admissions(p,  ti)  At^  <ti 

while  the  rules  for  rewriting  from  Ac  to  Ah  could  include 

Patients(p)  — +  Drug_Allergy(p,  ?drug_allergy.drugJd) 

Patient_Allergy(p,  d,  ni)  A  Notes_Text(n2,  i)  A  ni  =  «2 

— *■  Drug_Allergy_Text(p,  d,  x ) 


An  attempt  to  rewrite  logical  query  qc  of  Section  6.3.1  into  a  logical  query  over  A^  would 
result  in 

(3  ?pa.drugjiame) 

(3  ?nt. note_id) 

(3  ?a.patient_id) 

(3  ?a.admissionJtime). 

Patient_Allergy(?pa.patientJd,  ?pa.drug.name,  ?pa.note_id) 

A  Notes_Text(?nt.noteJd,  ?nt.text) 

A  Admissions(?a. patient  Jd,  ?a.admissionjtime) 

A  ?pa.noteJd  =  ?nt.note_id 
A  ?pa.patientJd  =  ?a.patientJd 
A  ?pa. drug-name  =  xd2001 
A  11994/1/1/8/0/0/0  <  ?a. admission-time 

This  logical  query  does  not  define  a  set  of  answers  from  A^\  because  the  relations  PA¬ 
TIENT-ALLERGY  and  NOTES-TEXT  of  the  minimized  schema  of  Ac  are  still  mentioned.  Similarly, 
an  attempt  to  rewrite  qc  into  a  logical  query  over  A^  would  result  in 

(3  ?dat. drug-id) 

(3  ?ptt.patientJd) 

(3  ?ptt. transaction-time). 

Drug-Allergy_Text(?dat. patient_id,  ?dat. drug-id,  ?dat.text) 

A  Patients-Transaction_Time(?ptt. patient_id,  ?ptt. transaction-time) 

A  ?dat. patient_id  =  ?ptt.patientJd 
A  ?dat. drug-id  =  experimentalArug_2001 
A  11994/1/1/8/0/0/0  <  ?ptt. transaction-time 


which  does  not  define  a  set 


of  a!2) 


answers  due  to  the  mentioning  of  the  (minimized  version  of)  Ac 


relation  PATIENTS-TRANSACTION.TIME. 
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But  there  is  a  clear  sense  in  which  the  first  rewriting  attempt  tells  us  that  Ah  )  provides  good 
partial  information,  and  we  might  decide  to  pursue  the  matter  further.  If  we  then  attempted  to 
apply  the  rules  for  rewriting  from  Ac  to  to  that  logical  query,  we  obtain  qH.  If  we  have  reason 
to  believe  that  A^  and  A^  are  using  consistent  patient  ids,  so  that  the  comparison 

?dat. patient  id  =  ?a.patient_id 

makes  sense — more  generally,  if  we  have  a  method  for  converting  from  Ag  )  patient  ids  to  a|,  ) 
patient  ids  as  part  of  the  rule  set  for  rewriting  from  A^  to  Ah  ) — then  qH  is  easily  transformed  into 
a  query  on  A^\  a  query  on  Ar  \  and  some  “glue”  code  that  the  mediator  can  use  to  join  the  sets 
of  answers  to  those  two  queries  into  a  set  of  answers  to  qc.  By  using  this  technique,  the  mediator 
can  combine  information  from  multiple  databases  in  responding  to  a  query. 

Not  every  relation  among  multiple  databases  can  be  broken  down  into  relations  between  pairs 
of  databases  in  the  fashion  illustrated  in  our  example.  Suppose  that,  instead  of  using  consistent 
patient  ids,  A^  and  A^  use  different  ids;  say,  A^  uses  the  patient’s  SSN  rather  than  the  arbitrary 
unique  id  used  in  A^.  If  the  information  for  converting  between  a[(  )  ids  and  Ah  )  ids  is  stored,  for 
confidentiality  reasons,  in  the  relation  PATIENT-DATA  of  yet  a  third  database  a[{  \  then  the  rules 
for  rewriting  patient  ids  from  A^  to  A^  and  conversely  must  contain  references  to  that  relation. 
Thus,  the  rules  can  no  longer  be  thought  of  as  simply  relating  A^  and  A^\  The  rule  for  rewriting 
patient  ids  from  A^  to  A^  still  has  the  general  form 

(pattern  that  matches  A^  patient  ids) 

— >  (some  function  of  that  pattern  that  produces  a\^  patient  ids) 

but  the  function  now  involves  generating  and  executing  a  query  on  A^3),  rather  than  simply  per¬ 
forming  a  syntactic  transformation  of  the  left  hand  side.  The  result  of  looking  up  the  patient  s 
SSN  in  Ah3^  is  used  in  rewriting  the  query  on  A^1)  to  a  query  on  Ah'2).  Fortunately,  given  the  com¬ 
binatorics,  such  situations  are  rare.  The  point  is  simply  that  effective  query  mediation  can  require 
knowledge  of  semantic  relationships  among  several  different  databases,  not  just  between  pairs  of 
databases.2 


6.5  Prototype  Implementation 

The  mediator  is  written  in  Common  Lisp,  and  the  wrappers  are  written  in  C.  Communications 
between  the  wrappers  and  the  translators  are  implemented  by  a  combination  of  low-bandwidth 
IPCs  used  as  signals  and  text  files  that  contain  the  queries  and  answers. 

The  details  of  the  prototype  system  design  and  implementation,  as  well  as  a  set  of  demonstration 
scenarios,  are  presented  in  the  Appendix. 

2 The  federated  database  approach  would  translate  all  data  values  used  by  the  databases  into  a  single  privileged 
representation  that  is  regarded  as  expressing  the  semantics  of  the  others,  for  example,  universal  patient  ids.  The 
problem  with  this  approach  is  that  it  is  much  harder  to  design  and  maintain  such  a  comprehensive,  all-purpose 
representation  than  it  is  to  specify  the  semantic  relationships  between  some  given  databases  of  interest. 


29 


Chapter  7 

Related  Work 


The  dominating  approach  to  the  interoperation  of  heterogeneous  databases  has  been  that  of  the 
federated  database  [1,  51].  As  we  observed  in  Chapter  3,  users  and  applications  of  a  local  data¬ 
base  must  switch  to  a  federated  schema  or  a  multidatabase  language  to  access  data  in  multiple 
databases,  which  almost  always  involve  different  data  models  and  query  languages.  For  example, 
SIMS  [1]  requires  users  and  applications  to  access  multiple  data  sources  using  the  Loom  knowl¬ 
edge  representation  language  and  a  domain  model  encoded  in  the  Loom  knowledge  base.1  In  other 
words,  the  data  in  a  local  database  are  made  interoperable,  but  the  applications  that  access  the 
data  in  the  local  database  remain  not  interoperable,  since  these  applications  are  coded  in  the  data 
model  and  query  language  of  the  local  database.  This  is  especially  impractical  for  legacy  databases 
because  the  bulk  of  the  significant  investment  made  by  organizations  in  such  databases  is  in  the 
applications  that  access  the  data.  In  contrast,  both  data  and  applications  are  made  interoperable 
with  our  mediator  approach. 

In  the  federated  database  approach,  either  a  federated  database  administrator  or  a  user  must 
first  identify  the  semantic  and  representational  mismatches,  and  then  construct  a  federated  schema 
to  resolve  these  mismatches,  before  data  in  multiple  databases  could  be  accessed.  The  construction 
of  the  federated  schema  is  essentially  a  schema  integration  process  [3],  which  offers  little  hope 
for  automation  [49].  In  comparison,  our  mediator  approach  does  not  need  the  mismatches  to  be 
resolved  and  removed  in  the  form  of  an  integrated  schema. 

In  addition,  most  researchers  advocate  the  use  of  a  powerful  interoperation  language  in  federated 
databases  that  could  directly  express  all  the  representational  constructs  of  heterogeneous  databases 
[2,  9,  23,  25].  Although  mapping  heterogeneous  databases  into  constructs  of  the  language  becomes 
straightforward,  all  the  semantic  and  representational  mismatches  still  have  to  be  resolved  in  the 
language,  which  offers  little  hope  for  efficiency  because  of  the  rich  semantics  and  representations  of 
the  language.  For  example,  higher-order  logic  must  be  used  in  [14]  to  reason  about  the  equivalence 
of  heterogeneous  representations.  In  contrast,  we  advocate  using  first-order  logic  as  our  mediation 

'Theoretically  it  is  possible,  in  the  reference  architecture  of  [51],  to  have  external  schemas  whose  data  models  and 
query  languages  are  different  from  a  federated  schema.  However,  it  remains  open  whether  and  how  this  could  be 
done  with  the  federated  database  approach.  Moreover,  having  an  external  schema  identical  to  a  local  schema  would 
introduce  architectural  redundancy. 
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language,  which  is  more  efficient  than  higher-order  logics.  The  use  of  representation  minimization 
techniques  further  improves  the  efficiency  of  query  mediation. 

The  idea  of  information  processing  and  communication  via  intelligent  mediation  is  introduced 
in  [60]  as  a  framework  of  future  information  systems.  Meta-attributes  have  been  used  in  [47]  to 
specify  the  contexts  associated  with  attribute  values.  Relationships  between  contexts  are  encoded 
as  conversion  rules,  and  attribute  values  are  mediated  through  these  relationships  to  ensure  that 
they  are  meaningful  with  respect  to  their  contexts.  This  is  a  special  case  of  query  mediation, 
where  the  mediation  is  restricted  to  context  matching  and  value  conversion.  An  example  of  query 
mediation  from  object-oriented  databases  to  relational  databases  is  given  in  [42],  where  the  schemas 
and  the  relationships  between  schemas  are  encoded  as  rules  in  F-logic.  We  support  query  mediation 
in  its  full  generality,  including  the  mediation  of  high-order  object  queries  to  first-order  relational 
queries  [40].  Our  mediation  language  is  more  efficient  than  F-logic,  making  the  correctness  of  query 
mediation  much  easier  to  define  and  verify. 

It  is  first  observed  in  [59]  that  data  should  be  shared  in  some  mediation  language  with  minimal 
representational  bias.  There,  the  relational  model  is  proposed  as  such  a  language,  from  which 
object-oriented  views  are  compiled  by  binding  relational  data  to  object  templates.  The  relational 
model  has  been  used  as  the  mediation  language  for  resolving  domain  mismatches  [10]  and  as  the 
glue  language  for  interconnecting  software  components  [4].  In  [15],  first-order  logic  is  recommended 
as  the  language  for  knowledge  sharing.  Our  mediation  language  is  essentially  the  language  of  the  re¬ 
lational  model,  and  our  representation  minimization  techniques  further  reduce  the  representational 
bias. 
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Chapter  8 

Conclusion 


We  have  presented  a  query  mediation  approach  to  the  interoperation  of  autonomous  heterogeneous 
databases  containing  data  with  semantic  and  representational  mismatches.  We  have  developed  a 
mediation  architecture  of  interoperation  that  facilitates  query  mediation,  and  have  formalized  the 
semantics  of  query  mediation.  Queries  are  mediated  between  multiple  databases,  and  users  and 
applications  of  a  local  database  access  data  in  multiple  databases  using  the  local  language  and 
schema,  making  both  the  data  and  the  applications  accessing  the  data  in  legacy  databases  inter¬ 
operable.  Queries  are  automatically  mediated,  relieving  users  from  the  difficult  task  of  resolving 
semantic  and  representational  mismatches.  Semantic  heterogeneity  is  separated  from  representa¬ 
tional  heterogeneity  by  representation  minimization  techniques,  reducing  the  space  of  heterogeneity 
and  improving  the  efficiency  of  automated  query  mediation.  Our  approach  provides  a  seamless  mi¬ 
gration  path  for  legacy  databases,  enabling  organizations  to  leverage  off  investments  in  legacy  data 
and  legacy  applications. 

Much  research  remains  to  be  done.  First,  we  have  focused  on  three  components  of  the  mediation 
architecture,  namely,  the  mediation  language,  the  query  transformer,  and  the  translator  for  object 
query  languages  [40].  Research  is  needed  in  the  other  components  as  well  as  in  translators  for  other 
kinds  of  query  languages. 

Second,  we  have  assumed  that  the  knowledge  in  the  mediator’s  knowledge  base  is  available. 
How  to  obtain  such  knowledge  is  certainly  an  important  issue.  Although  the  acquisition  of  such 
knowledge  is  likely  to  be  a  highly  interactive  process,  automated  acquisition  tools  would  be  valuable. 

Third,  we  have  restricted  ourselves  to  constraints,  relationships,  and  queries  that  do  not  involve 
negation.  The  semantics  of  query  mediation  could  certainly  be  generalized  to  allow  negation,  as 
long  as,  for  example,  the  result  is  stratified  [57].  The  approach  could  also  be  easily  generalized  to 
the  interoperation  with  deductive  databases  containing  rules  in  addition  to  constraints. 

Finally,  research  is  needed  in  the  autonomous  optimization  of  mediated  query  evaluation.  Be¬ 
cause  of  the  autonomy  of  participating  databases,  the  mediator  often  does  not  have  access  to  the 
performance  information  that  is  crucial  in  query  optimization.  The  mediator  needs  a  cost  model 
that  is  independent  of  the  implementation  structures  of  participating  databases  [61].  Techniques 
are  also  needed  for  the  mediator  to  obtain  performance  information  by  querying  [11]. 
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Chapter  9 

Introduction 


As  more  multilevel  databases  are  built  and  connected  through  computer  networks,  a  wide  variety 
of  secure  data  sources  will  become  accessible,  A  big  challenge  presented  by  this  technology  is  the 
secure  interoperation  of  multilevel  databases  containing  data  with  mismatched  security  policies. 
Providing  secure  interoperation  of  multilevel  databases  not  only  makes  it  possible  to  reliably  share 
data  in  isolated  military  and  civilian  databases,  but  also  increases  users’  confidence  and  willingness 
in  such  sharing. 

9.1  Problem 

As  a  prerequisite  to  the  secure  interoperation  of  multilevel  databases  containing  data  with  mis¬ 
matched  security  policies,  the  security  policies  of  component  databases,  as  well  as  the  potential 
mismatches  between  them,  have  to  be  precisely  characterized.  Existing  literature  has  been  vague 
on  what  constitutes  a  security  policy,  its  content  ranging  from  high-level  specifications  such  as  the 
type  of  access  control  (mandatory  or  discretionary  access  control)  or  the  kind  of  model  (noninter¬ 
ference  or  Bell-LaPadula),  to  designer’s  belief  or  preferences  such  as  whether  polyinstantiation  is 
allowed,  to  low-level  specifications  such  as  the  number  of  levels  and  categories  allowed  in  a  lattice. 
A  formal  policy  framework  is  needed  within  which  security  policies  could  be  characterized  and 
compared  [20]. 

It  has  been  widely  accepted  that  a  mandatory  access  control  (MAC)  policy  consists  of  four 
components:  a  set  of  subjects,  a  set  of  objects,  a  lattice,  and  a  mapping  that  associates  levels  in 
the  lattice  to  subjects  and  objects  [27].  This  works  well  for  multilevel  operating  systems,  because 
objects  such  as  files  do  not  carry  semantics.  For  multilevel  databases  where  data  carry  semantics, 
the  same  mapping  of  levels  to  objects  such  as  elements  in  tuples  could  have  completely  different 
meanings  [52].  For  example,  consider  a  relation  SMD(Starship,  Mid,  Destination).  A  secret  label 
on  element  Rigel  of  tuple  (Enterprise,  101,  Rigel)  in  SMD  could  mean  that  the  fact  “Enterprise  is 
going  to  Rigel”  is  secret,  or  the  fact  “some  starships  are  going  to  Rigel”  is  secret,  or  even  the  word 
“Rigel”  is  secret.  This  confusion  suggests  that  something  critical  is  missing  with  the  traditional 
formulation  of  MAC  policies  in  multilevel  databases,  namely  the  semantics  of  object  labels.  This 
problem  is  crucial  in  the  secure  interoperation  of  multilevel  databases.  For  example,  if  the  secret 
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label  on  Rigel  means  that  the  fact  “some  starships  are  going  to  RigeP  is  secret  in  database  A, 
and  means  that  the  word  “Rigel”  is  secret  in  database  B,  then  unclassified  users  could  query  all 
existing  destinations  in  database  A  and  obtain  “Rigel”  through  interoperation  with  database  B. 
The  canonical  MAC  policy  for  federated  databases  proposed  in  [36]  does  not  solve  this  problem. 

The  formulation  of  a  MAC  policy  in  a  multilevel  database  often  includes  some  constraint  policies, 
such  as  the  labeling  policy  of  Seaview  [30]  and  the  classification  constraints  of  LDV  [18].  Constraints 
are  the  most  important  means  of  specifying  data  semantics.  However,  existing  multilevel  databases 
provide  neither  a  precise  definition  of  constraint  validity  nor  an  efficient  mechanism  of  constraint 
enforcement.  In  fact,  it  has  been  argued  [8,  32]  that  integrity  enforcement  is  in  fundamental  conflict 
with  secrecy  enforcement:  no  multilevel  databases  could  simultaneously  satisfy  both  integrity  and 
secrecy  requirements. 

An  important  characteristic  of  MAC  policies  is  the  upward  information  flow  in  the  lattice, 
which  indicates  the  believability  of  low  data  at  high  levels.  For  multilevel  operating  systems  where 
objects  do  not  carry  semantics,  low  data  are  always  believed  at  high.  For  multilevel  databases 
where  data  carry  semantics  expressed  by  constraints  however,  low  data  could  contradict  high  data. 
For  example,  if  we  require  that  high  SMD  tuples  have  unique  Mid  elements  and  (Enterprise,  101, 
Rigel)  is  a  high  tuple  in  SMD,  then  the  low  tuple  (Enterprise,  102,  Rigel)  in  SMD  could  not  be 
believed  at  high.  This  problem  suggests  that  upward  information  flow  should  be  constrained  in  the 
formulation  of  MAC  policies  in  multilevel  databases. 

Constraints  also  bring  about  the  danger  of  inference  channels.  Inference  channels  could  be 
obtained  either  by  knowing  the  constraints  enforced  by  a  database  or  by  observing  the  behavior 
of  a  database  in  enforcing  the  constraints.  For  example,  consider  another  relation  MT(MissionId, 
Type).  If  we  require  that  every  Mid  element  in  relation  SMD  refers  to  a  Missionld  element  in  MT, 
and  a  low  Mid  element  refers  to  a  high  Missionld  element,  then  low  users  could  infer  the  existence 
of  the  high  Missionld  element.  If  we  require  that  every  high  Mid  element  in  SMD  refers  to  a  low 
Missionld  element  in  MT,  then  the  attempt  to  delete  a  low  Missionld  element  referred  to  by  a  high 
Mid  element  would  either  cause  a  loss  of  high  data  or  enable  low  users  to  infer  the  existence  of  the 
high  Mid  element.  Thus  the  formulation  of  MAC  policies  in  multilevel  databases  should  provide 
additional  means  to  detect  and  remove  such  inference  channels. 

9.2  Overview  of  This  Part 

We  have  developed  a  formal  policy  framework  for  MAC  policies  in  multilevel  relational  databases, 
which  serves  as  the  basis  for  specifying  such  policies  and  for  characterizing  their  potential  mis¬ 
matches.  In  Chapter  10,  we  describe  our  framework  and  identify  the  components  of  MAC  policies. 
In  Chapter  11  we  introduce  the  (single- level)  relational  model  and  the  notion  of  atomic  decom¬ 
position,  which  will  be  used  repeatedly  in  the  following  chapters.  In  Chapters  12  through  14,  we 
investigate  in  detail  the  three  most  important  components  of  our  policy  framework. 

In  particular,  Chapter  12  presents  interpretation  policies  which  map  multilevel  relational 
databases  with  tuple-level  and  element-level  labeling  to  logical  theories  and  structures.  Chap¬ 
ter  13  presents  view  policies  as  means  to  constrain  upward  information  flow  in  the  lattice,  identifies 
desirable  properties  of  such  policies,  and  develops  a  view  policy  that  satisfies  these  properties. 
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Chapter  14  presents  update  policies  as  means  to  enforce  constraints  without  introducing  inference 
channels,  identifies  desirable  properties  of  such  policies,  and  develops  an  update  policy  that  satisfies 
these  properties. 

Our  framework  could  be  used  to  capture  and  resolve  the  MAC  policy  mismatches  in  the  secure 
interoperation  of  heterogeneous  multilevel  databases.  In  Chapter  15,  we  take  an  initial  step  in 
this  direction,  by  investigating  the  secure  interoperation  of  multilevel  databases  whose  MAC  poli¬ 
cies  mismatch  in  one  specific  component — the  lattice.  Finally,  Chapter  16  offers  some  concluding 
remarks  and  a  brief  discussion  of  future  work. 

Due  to  space  limitations,  formal  proofs  of  the  results  presented  in  Chapters  11  through  15  are 
not  included.  Interested  readers  can  find  them  in  [17,  37,  39,  41]. 
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Chapter  10 

A  Policy  Framework 


In  this  chapter,  we  first  develop  a  logical  foundation  of  multilevel  relational  databases.  We  then 
present  a  framework  of  MAC  policies  based  on  the  logical  foundation,  and  identify  the  components 
of  our  framework. 

10.1  A  Model-Theoretic  Formulation  of  Multilevel  Relational 
Databases 

A  state  of  the  world  could  be  envisioned  as  a  set  of  elements  linked  together  by  relationships. 
Information  in  a  state  of  the  world  is  the  knowledge  of  the  truth  value  of  a  statement  about  the 
state  of  the  world  [35],  which  could  be  either  an  elementary  fact  such  as  “Enterprise  is  on  mission 
#101  to  Rigel”  or  a  general  law  such  as  “starships  have  unique  missions”. 

A  relational  database  captures  a  finite  set  of  elements  linked  together  by  relationships.  Ele¬ 
mentary  facts  are  represented  as  tuples  in  relations,  and  general  laws  are  represented  as  integrity 
constraints.  For  example,  the  elementary  fact  “Enterprise  is  on  mission  #101  to  Rigel”  could  be 
represented  by  the  tuple  (Enterprise,  101,  Rigel)  in  relation  SMD,  and  the  general  law  “starships 
have  unique  missions”  is  represented  by  a  functional  dependency  SMD:  Starship  — ►  Mid. 

A  standard  model-theoretic  formulation  of  a  (single-level)  relational  database  is  to  interpret 
the  integrity  constraints  as  forming  a  first-order  theory,  and  the  relations  as  forming  a  first-order 
structure  of  the  theory  [35].  A  database  is  valid  if  the  structure  is  a  model  of  the  theory.  For 
example,  the  tuple  (Enterprise,  101,  Rigel)  is  interpreted  as  a  tuple  in  the  assignment  to  predicate 
SMD,  and  the  functional  dependency  SMD:  Starship  — *  Mid  is  interpreted  as  the  axiom 

(Vz,t/i,t/2,zi,Z2)(SMD(a;,i/i,zi)  A  SMD(z,  y2,  z-f)  -»■  y\  =  y-f). 

A  multilevel  state  of  the  world  is  a  family  of  states  of  the  world,  one  for  every  level  in  a  security 
lattice.  Information  in  a  multilevel  state  of  the  world  is  the  knowledge  of  the  truth  value  of  a 
statement  about  a  state  of  the  world  [55]  or  about  the  multilevel  state  of  the  world.  The  former 
could  be  either  a  classified  elementary  fact  such  as  “it  is  top-secret  that  Enterprise  is  on  mission 
#101  to  Rigel”,  or  a  classified  general  law  such  as  “it  is  confidential  that  starships  have  unique 
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missions”.  The  latter  could  be  a  general  law  on  classification  such  as  “starships  classified  at  all 
levels  have  unique  missions”. 

A  multilevel  relational  database  is  a  relational  database,  whose  integrity  constraints  are  called 
view  constraints ,  together  with  a  labeling  function  k  and  a  set  of  labeling  constraints.  The  labeling 
function  maps  every  object  in  the  database  —  relation,  attribute,  tuple,  element  in  a  tuple,  view 
constraint,  etc.  —  to  a  (possibly  empty)  set  of  levels  in  a  security  lattice.  For  a  multilevel  state 
of  the  world,  the  database  and  the  labeling  function  together  represent  the  family  of  states  of  the 
world,  and  the  labeling  constraints  represent  the  general  laws  on  classification1 .  For  example,  the 
tuple  (Enterprise,  101,  Rigel)  mapped  to  ts  by  k  represents  the  elementary  fact  “it  is  top-secret  that 
Enterprise  is  on  mission  #101  to  Rigel”.  As  a  view  constraint,  the  functional  dependency  SMD: 
Starship  — »  Mid  mapped  to  c  by  n  represents  the  general  law  “it  is  confidential  that  starships 
have  unique  missions”.  As  a  labeling  constraint,  the  functional  dependency  SMD:  Starship  — »  Mid 
represents  the  general  law  “starships  classified  at  all  levels  have  unique  missions”. 

The  above  observation  suggests  a  model-theoretic  formulation  of  multilevel  relational  databases 
as  follows.  A  multilevel  theory  is  a  triplet  (£,  {T1}i^l,C): 

1.  C  =  (L,  ;<)  is  a  security  lattice  where  L  is  a  set  of  levels  and  <  is  the  dominance  relation, 

2.  { Tl}i£L  is  a  family  of  first-order  theories  —  one  for  every  level  in  L,  each  of  which  representing 
the  view  constraints  classified  at  a  particular  level,  and 

3.  C  is  a- collection  of  axioms  representing  the  labeling  constraints. 

We  use  -<  to  denote  the  strict  dominance  subrelation,  and  ■<*  to  denote  the  transitive  closure  of  X. 
A  multilevel  structure  of  the  multilevel  theory  is  a  family  of  first-order  structures  {M1}i$l  where 
Ml  is  a  structure  of  theory  Tl. 

For  example,  the  tuple  (Enterprise,  101,  Rigel)  mapped  to  ts  is  interpreted  as  a  tuple  in  the 
assignment  to  predicate  SMDts  in  structure  Mts,  the  view  constraint  SMD:  Starship  — >•  Mid 
mapped  to  c  is  interpreted  as  the  axiom 


(Vx,yi,2/2,2i,^2)(SMDc(x,2/i,2i)  A  SMDc(z,y2,Z2)  -*  2/i  =  2/2) 
in  theory  Tc,  and  the  labeling  constraint  SMD:  Starship  — >  Mid  is  interpreted  as  the  axiom 
(V/1,/2  €  £)(Vx,  j/i,  j/2)  21, 22)(SMD;i(a:,  yi,zi)  A  SMD,2(a:, y2,22)  -*■  2/1  =  Vi) 

in  C. 

10.2  MAC  Policy 

We  restrict  ourselves  to  multilevel  relational  databases  whose  MAC  policies  have  the  simple  security 
property  and  the  ^-property  of  the  Bell-LaPadula  model  [27],  which  ensure  that  information  does 
not  flow  downward  in  the  lattice. 

'Since  labeling  constraints  themselves  are  not  objects  in  the  database,  they  are  not  labeled  by  the  labeling  function. 
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•  The  Simple  Security  Property.  A  process  is  allowed  a  read  access  to  a  tuple  only  if  the 
former’s  clearance  level  is  identical  to  or  higher  than  the  latter’s  classification  level  in  the 
lattice. 

•  The  ^-Property.  A  process  is  allowed  a  write  access  to  a  tuple  only  if  the  former’s  clearance 
level  is  identical  to  or  lower  than  the  latter’s  classification  level  in  the  lattice. 

Our  formulation  of  a  MAC  policy  in  a  multilevel  relational  database  has  seven  components: 

1.  a  lattice, 

2.  a  set  of  subjects, 

3.  a  set  of  objects, 

4.  a  mapping  of  subjects  and  objects  to  levels  in  the  lattice, 

5.  an  interpretation  policy, 

6.  a  view  policy,  and 

7.  an  update  policy. 

The  first  four  components  together  correspond  to  the  traditional  formulation  of  MAC  policies  in 
multilevel  operating  systems. 

An  interpretation  policy  maps  a  multilevel  relational  database  to  a  multilevel  theory  and  a 
multilevel  structure  of  the  theory.  Through  this  policy,  the  superficial  syntactic  difference  in  object 
labels  is  abstracted  away,  and  the  semantic  difference  hidden  in  object  labels  is  made  precise.  As 
a  consequence,  the  interpretation  policy  makes  it  possible  to  compare  the  semantics  of  multiple 
MAC  policies. 

A  view  policy  consists  of  a  set  of  view  constraints  and  a  specification  of  the  validity  of  view 
constraints.  This  policy  specifies  the  upward  information  flow  requirements. 

An  update  policy  consists  of  a  set  of  labeling  constraints,  a  set  of  updates,  and  a  specification 
of  the  enforcement  of  labeling  constraints  in  performing  the  updates.  This  policy  specifies  the 
mechanisms  to  eliminate  inference  channels  in  the  enforcement  of  labeling  constraints. 

In  the  rest  of  this  report,  we  develop  the  last  three  components  of  our  policy  framework,  using 
examples  from  multilevel  relational  databases  based  on  the  lattice  in  Figure  10.1. 
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Chapter  11 

Relational  Model 


Before  defining  the  multilevel  relational  model,  we  need  to  define  the  (single-level)  relational  model. 
Following  the  practice  of  most  existing  approaches,  we  consider  the  relational  model  [56]  extended 
with  two  important  classes  of  constraints:  key-based  functional  and  referential  dependencies.  We 
then  develop  the  technique  of  atomic  decomposition,  and  characterize  the  information  content  of 
relational  databases  using  the  technique.  The  results  obtained  here  will  be  used  repeatedly  in  the 
following  chapters. 

11.1  Basic  Notations 

Let  U  be  a  finite  set  of  attributes.  If  X,Y  are  subsets  of  U,  then  XY  denotes  the  union  of  X,  Y. 
If  A  £  U,  then  XA  denotes  X{A).  A  relation  scheme  (in  Boyce-Codd  Normal  Form)  R[X,K]  is 
a  set  of  attributes  X  C  U  named  R  with  nonempty  primary  key  K  Cl.  A  database  schema  is  a 
pair  (1Z,C),  where  1Z  =  {jR,[X;,  -fif»]}i<»<n  is  a  family  of  relation  schemes  and  C  is  a  set  of  key-based 
referential  dependencies'. 

1.1  Every  referential  dependency  in  C  has  the  form  i2,[E]  <— ►  Rj,  where  1  <  i,j  <  n,  Y  =  K{  or 

Y  Cl  -  K{ ,  and  |F  |  =  \Kj\.  Y  is  a,  foreign  key  in  relation  scheme  R,  to  relation  scheme  Rj. 

1.2  Distinct  foreign  keys  in  the  same  relation  scheme  are  disjoint.  In  other  words,  Y  —  Z  or 

Y  n  Z  =  0  for  1  <  i,  j,  k  <  n  and  Ri [F]  «-*■  Rj,  Ri[Z]  <— *■  Rk  in  C. 

For  relation  scheme  R{[Xi,Ki\  in  1Z  and  attribute  A  £  A,-,  A  is  a  nonkey  attribute  if  A  £  Ki  and 
A  $  Y  for  any  foreign  key  Y  in  R{.  Figure  11.1  shows  a  schema  with  two  relation  schemes  SMD 
and  MT,  where  boxes  represent  relation  schemes,  attributes  to  the  left  of  double  lines  form  primary 
keys,  and  arrows  between  boxes  represent  referential  dependencies. 

Let  V  be  a  (possibly  infinite)  set  of  values.  A  tuple  over  attributes  X  is  a  partial  mapping 
f[A]:  IhD  that  assigns  values  from  V  to  attributes  in  X .  For  attribute  A  £  X ,  t[A]  denotes  the 
value  assigned  to  A  by  t[X],  and  f[A]  =  L  denotes  that  t[A]  is  undefined1.  For  attributes  Y  C.  X, 

1  We  distinguish  between  unknown  nulls  and  not- applicable  nulls.  The  symbol  L  represents  unknown  nulls.  Un¬ 
known  nulls  say  that  some  elementary  facts  are  missing  from  the  database.  Because  a  database  is  not  a  part  of  the 
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SMD 


Starship  Mid  Destination 


MT  ^ 

Missionld 

Type 

Figure  11.1:  A  Schema 


t[Y]  denotes  the  partial  mapping  whose  domain  is  restricted  to  attributes  in  Y.  For  tuple  t  over 
X ,  t[X]  =  Z  denotes  that  t  is  empty:  t[A\  =  Z  for  all  attributes  A  £  X;  and  t[X]  7Z  L  denotes  that 
t  is  total:  t[A\  ^  L  for  all  attributes  A  £  X. 

A  relation  r  over  relation  scheme  R[X,K]  is  a  set  of  tuples  over  X.  For  attributes  Y  C  X, 
r[y]  denotes  the  set  of  tuples  t[Y]  where  t  £  r.  Relation  r  is  valid  if  it  satisfies  the  key  integrity 
property: 

2.1  for  every  tuple  t  £  r,  t[K]  ±  Z,  and 

2.2  for  every  pair  of  tuples  t,t'  £  r,  t[K ]  =  t'[K]  implies  t  =  t' . 

In  other  words,  tuples  with  the  same  primary  key  value  are  identical. 

A  database  b  over  database  schema  (11,  C ),  where  1Z  =  A',]}i<t<„,  is  a  family  of  relations 

{r,}i<,<n,  where  r,-  is  a  relation  over  Ri[X{,  K{].  It  is  r-valid  if  every  relation  in  b  is  valid.  It  is 
valid  if  it  is  r-valid  and  satisfies  the  referential  integrity  property  for  every  referential  dependency 
Ri  [F]  Rj  in  C  and  tuple  t  £  r;: 

3.1  either  f[y]  =  L  or  t[y]  7^  Z,  and 

3.2  if  t[Y]  ,Z  Z  then  there  is  a  tuple  t'  £  rj  such  that  f[y]  =  t'[Kj]. 

In  other  words,  every  non-null  foreign  key  value  refers  to  an  existing  primary  key  value.  V  is  the 
universe  of  b.  Below  is  a  valid  database  over  the  schema  of  Figure  11.1. 


Starship 

Mission 

Destination 

Enterprise 

101 

Rigel 

Voyager 

102 

Talos 

Discovery 

103 

Rigel 

Missionld 

Type 

101 

spy 

102 

explore 

103 

mine 

For  y  Cl,  the  total  projection  of  relation  r[X]  to  Y,  denoted  as  IIyr[X],  is  defined  as  the  set 
of  tuples  t[Y]  such  that  t[y]  £  r[y]  and  f[y]  7Z  Z. 

state  of  the  world  that  the  database  tries  to  capture,  unknown  nulls  do  not  represent  elementary  facts  in  the  state  of 
the  world. 
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11.2  Atomic  Decomposition 

Every  tuple  in  a  database  represents  an  elementary  fact.  Often,  the  elementary  fact  represented 
by  a  tuple  is  a  conjunction  of  several  smaller  elementary  facts.  For  example,  tuple  (Enterprise, 
101,  Rigel)  represents  the  elementary  fact  “Enterprise  is  on  mission  #101  to  Rigel”,  which  is  the 
conjunction  of  two  smaller  elementary  facts  “Enterprise  is  on  mission  #101”  and  “Enterprise  goes 
to  Rigel”. 

Let  B  =  (7 Z,C)  be  a  schema  where  TZ  =  {Rt[X,,  if,]} !<,•<„.  The  atomic  decomposition  of  B  is  a 
schema  Ba  consisting  of  the  following  set  of  relation  schemes  TZa: 

•  Rf[Ki,Ki]  for  every  K,}  in  TZ, 

•  Rf [IUY,  K{ }  for  every  72;  [A,-.  Kf  in  TZ  and  foreign  key  Y  in  72,  where  Y  C  Xt  -  Ki,  and 

•  Rf[I(xA,  Ki]  for  every  72, [X,,  I(x]  in  TZ  and  nonkey  attribute  A  G  X{  -  Kt; 
and  the  following  set  of  key-based  referential  dependencies  Ca: 

•  Rf[Ki]  «->■  Rf  and  Rf[K,}  «-»•  Rf  for  every  Rf[Kx,  K%],  Rf [KXY,  Kt),  and  Rf[K{A,  Ki], 

•  Rf[Ki]  c-»  R,f  if  Ki  is  a  foreign  key  in  R{  to  Rj,  and 

•  Rf  [y]  <— >  Rf  for  every  foreign  key  Y  in  Rx  to  Rj  where  Y  Cl,  -  Kx. 

In  other  words,  the  atomic  decomposition  of  a  schema  consists  of  a  relation  scheme  for  the  primary 
key,  a  relation  scheme  for  every  foreign  key,  and  a  relation  scheme  for  every  nonkey  attribute. 
Figure  11.2  shows  the  atomic  decomposition  of  the  schema  of  Figure  11.1. 


S  I  M 


Figure  11.2:  An  Atomic  Decomposition  of  Schema 

From  every  database  b  over  B  we  could  construct  a  unique  database  ba  over  the  atomic  decom¬ 
position  Ba  of  B  as  follows.  From  every  relation  r,  €  b  over  Ri[Xx,  A',]  in  B,  we  construct  relations 
rf  =  n^r„  rf  =  and  rf  =  II K.A^i  in  ba  over  Rf ,  Rf ,  and  Rf  respectively. 

Since  b  and  ba  capture  the  same  elementary  facts,  B  and  its  atomic  decomposition  Ba  are 
semantically  equivalent  [41].  Notice  that  every  tuple  in  b  is  in  general  broken  into  several  smaller 
tuples  in  ba.  Therefore  every  elementary  fact  captured  by  b  is  equivalent  to  a  conjunction  of  perhaps 
several  smaller  elementary  facts  captured  by  6°. 

Notice  that  the  atomic  decomposition  of  a  database  does  not  contain  L  (by  the  definition  of 
the  total  projection  operator  II).  This  implies  that  null  values  in  a  database  do  not  represent 
elementary  facts  in  a  state  of  the  world,  which  coincides  with  our  intuition. 
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Furthermore,  the  atomic  decomposition  of  B  into  Ba  is  the  finest  possible  decomposition,  in 
the  sense  that  every  tuple  in  6°  represents  an  atomic  elementary  fact  whose  further  decomposition 
leads  to  loss  of  information.  For  example,  tuple  (Enterprise,  101)  represents  the  elementary  fact 
“Enterprise  is  on  mission  #101”,  while  tuples  (Enterprise)  and  (101)  represent  the  elementary  facts 
“there  is  a  starship  named  Enterprise”  and  “there  is  a  starship  on  mission  #101  respectively.  The 
conjunction  of  the  latter  two  is  not  equivalent  to  the  former. 

11.3  Information  Content 

Given  databases  b  and  b'  over  schema  B  with  relations  {7\'}i<i<n  an<^  {ri}i<t<n  respectively,  b  U  b 
denotes  the  database  {7',U7'(}i<t'<n  over  the  same  schema.  Database  6  is  a  subdatabase  of  b  ,  denoted 
as  6  C  b\  if  r;  C  r •  for  1  <  i  <  n.  Database  b'  is  more  informative  than  b,  denoted  as  b  C  b if  the 
atomic  decomposition  of  b  is  a  subdatabase  of  the  atomic  decomposition  of  b  .  In  other  words,  b  is 
more  informative  than  b  if  every  atomic  tuple  in  b  is  also  an  atomic  tuple  in  b' . 

Let  v,v'  be  either  values  in  V  or  L.  We  define  the  operators  0,0  on  v,  v'  as  follows,  where 
c®!)'  computes  the  nonconflicting  information  in  v,  v',  and  v  0  v'  computes  the  information  in  v 
and  the  nonconflicting  information  in  v'\ 

v  if  v  =  v'  or  v'  =  L 
v  0  v1  =  v'  if  v  —  L 
L  otherwise 

.  f  v  if  v  #  L 
v  Q)v  =  {  ,  , , 

1  v  otherwise 

Let  t ,  t'  be  tuples  over  X.  We  define  the  operators  ® ,  0  on  t,  t'  as  follows,  where  t®t'  computes  the 
nonconflicting  information  in  t,t' ,  and  t  Q)  t'  computes  the  information  in  t  and  the  nonconflicting 
information  in  t': 

{t  if  t  =  t'  or  t'  =  L 
t'  if  t  =  l 
L  otherwise 

,  _  ft  if  t  #  L 
~  (  t'  otherwise 

Let  B  =  (7 Z,C)  be  a  schema  where  7 Z  =  {I2t[X;, FG]}i <,•<„.  Given  relation  rt-  over  Ri[X{,Ki]  and 
tuples  t,t'  €  r{,  we  define  the  operators  0,©  on  t,t'  as”follows,  where  t  0  t'  computes  a  tuple  over 
X{  that  contains  the  nonconflicting  atomic  tuples  in  t,t',  and  t  ©  t'  computes  a  tuple  over  Xi  that 
contains  the  atomic  tuples  in  t  and  the  nonconflicting  atomic  tuples  in  t' .  Suppose  that  Z  is  either 
Ki,  or  a  foreign  key  Y  in  Ri,  or  a  nonkey  attribute  A  £  Xi: 

(t®t')[Z]  =  t[Z]®t'[Z] 

(tet')[z\  =  t[Z]Qt'[z\ 
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Given  two  relations  r,,  r[  over  R,[Xi,  Kt\,  let  r,-  ©  r-  denote  the  following  relation  over  Rt[Xi,  if,], 
which  computes  the  nonconflicting  information  in  r,-,  r[ : 

{t  ©  /'|Z  €  U  A  /'  £  r  •  A  t[Ki]  =  t'[Ki ]} 

U{/|Z  £  r,-  A  -i(3Z,)(Z/  £  r-  A  /'[if,]  =  Z[iif;])} 

G  r\  A  -i(3Z)(Z  £  r,-  A  Z[iift]  =  /'[if,])} 

and  let  rt  0  r'  denote  the  following  relation  over  R,[Xi,  if,-],  which  computes  the  information  in  r,- 
and  the  nonconflicting  information  in  r,-: 

{/  0  Z'|Z  E  r,-  At'  £  r[  A  Z[/ft]  =  /'[if,]} 

U{Z|Z  G  r,-  A  -i(3z')(z'  e  r'  A  =  Z[ff,])} 
u{z'|z'  e  r\  A  -i(3z)(z  eri  A  /[if,-]  =  /'[iif,])} 

Given  two  databases  b  =  {r,}i<,<n  and  b '  =  over  5,  let  b®b'  and  bQb'  denote  respectively 

the  databases  {r,-  ©  rt'}i<t<n  and  {r,-  0  r[}i<,<n  over  B.  Figure  11.3  shows  two  relations  SMDi  and 
SMD2  over  the  relation  scheme  SMD  of  Figure  11.1,  together  with  SMDj©  SMD2  and  SMDi© 
SMD2. 
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Figure  11.3:  Filtering  Functions 


Theorem  11.1  For  r-valid  databases  b  and  b'  over  B,  bQb1  is  an  r-valid  database  over  B  such 
that  b  C  6  0  b'  C  b  U  b',  and  c  C  6  ©  b'  for  every  r-valid  database  c  over  B  where  b  C  c  C  b  U  b' . 

Theorem  11.1  tells  us  that  6©  6'  is  an  r-valid  database  more  informative  than  b  but  less  infor¬ 
mative  then  bub',  and  is  the  (unique)  most  informative  such  database.2  In  Figure  11.3,  SMDi© 
SMD2  is  more  informative  than  SMDi  because  it  contains  the  tuple  (Discovery,  103,  Rigel).  It  is 
less  informative  than  SMDiU  SMD2  because  it  does  not  contain  the  atomic  tuple  (Enterprise,  102). 

2  By  restricting  ourselves  to  key-based  functional  and  referential  dependencies,  such  a  database  always  exists,  which 
is  not  necessarily  the  case  for  more  general  view  constraints. 
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Theorem  11.2  For  r-valid  databases  b  and  b'  over  B,  b  ®  b1  is  an  r-valid  database  over  B  such 
that  b  ©  b'  C  b  U  b' .  For  every  r-valid  database  c  over  B  where  b®b'  HcQbLlb',  there  is  an  r-valid 
database  c'  over  B  where  b  ©  b'  C  c'  C  b  U  b',  such  that  neither  cQc'  nor  c'  C  c. 

Theorem  11.2  tells  us  that  b  ©  b1  is  an  r-valid  database  less  informative  than  b  U  6'.  Moreover, 
any  such  database  that  is  strictly  more  informative  than  6®  b'  has  to  involve  a  random  choice:  there 
is  another  such  database  that  is  incomparable  in  information  content.  In  other  words,  b  ®  b  is  the 
(unique)  most  informative  such  database  that  does  not  involve  random  choices.2  In  Figure  11.3, 
SMDi®  SMD2  is  less  informative  than  SMDiU  SMD2  because  the  Mission  of  Enterprise  is  missing. 
Any  r-valid  database  strictly  more  informative  than  SMDi©  SMD2  but  no  more  informative  than 
SMDiU  SMD2  has  to  contain  either  (Enterprise,  101)  or  (Enterprise,  102)  but  not  both,  which 
involves  a  random  choice  between  the  two. 
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Chapter  12 

Interpretation  Policy 


An  interpretation  policy  maps  a  multilevel  database  to  a  multilevel  theory  and  a  multilevel  struc¬ 
ture  of  the  theory.  Through  this  mapping,  the  superficial  syntactic  difference  in  object  labels  is 
abstracted  away,  and  the  semantic  difference  hidden  in  object  labels  is  made  precise.  As  a  con¬ 
sequence,  the  interpretation  policy  makes  it  possible  to  compare  the  semantics  of  multiple  MAC 
policies. 

Here,  we  investigate  the  interpretation  policies  for  two  most  common  multilevel  databases, 
namely  multilevel  databases  with  tuple-level  and  element-level  labeling,  where  objects  are  tuples 
and  elements  in  tuples  respectively. 

Intuitively,  element-level  labeling  seems  to  be  more  expressive  than  tuple-level  labeling,  because 
it  is  more  fine-grained  in  capturing  classified  elementary  facts  in  a  multilevel  state  of  the  world.  On 
the  other  hand,  element-level  labeling  seems  to  be  more  complicated  and  difficult  to  implement  than 
tuple-level  labeling.  A  formal  characterization  of  the  expressive  power  of  these  labeling  mechanisms 
would  be  invaluable  in  making  design  decisions  such  as  which  mechanism  to  use  in  building  a 
multilevel  database. 

12.1  Tuple-Level  Labeling 

We  first  define  the  multilevel  relational  model.  A  multilevel  relation  scheme  is  a  pair  (R[X,  K],C), 
where  R[X,  K]  is  a  relation  scheme  and  £  is  a  security  lattice.  A  multilevel  database  schema  is  a 
pair  (/?,£),  where  B  is  a  database  schema  and  £  is  a  security  lattice. 

Let  (B,  £)  be  a  multilevel  schema,  where  B  =  (71, C),  71  =  and  £  =  ( L,< ). 

A  multilevel  relation  with  tuple-level  labeling  over  multilevel  relation  scheme  (R,[Xi,  K{],  £)  is  a 
pair  (7',-,/C,-),  where  r,  is  a  relation  over  R,[Xi,  K{}  and  k is  a  mapping  from  tuples  over  X,  to  sets 
of  levels  in  L,  such  that  «,-(£)  =  {}  if  and  only  if  t  ^  r,-,  and  l  £  /c* (#)  if  t  is  labeled  at  l  €  L. 

A  multilevel  database  with  tuple-level  labeling  over  multilevel  database  schema  (B,  £)  is  a  family 
{(r, ,  K,)}i<,-<n,  where  (r,-,/c,-)  is  a  multilevel  relation  with  tuple-level  labeling  over  (R,[Xt,  Kt],C). 
We  denote  it  by  the  pair  (6,  k),  where  b  =  {r,}i<,<„  is  a  database  over  B,  and  /c  =  {/c,}i<,<„  is  a 
family  of  mappings.  Figure  12.1  shows  a  multilevel  database  over  the  schema  of  Figure  11.1  and 
the  security  lattice  of  Figure  10.1.  The  labels  of  every  tuple  are  listed  to  the  right  of  that  tuple. 
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Figure  12.1:  A  Multilevel  Database  with  Tuple-Level  Labeling 

The  interpretation  policy  of  tuple-level  labeling  is  straightforward.  Because  every  tuple  in  a 
relation  represents  an  elementary  fact  in  a  state  of  the  world,  the  label  of  the  tuple  naturally 
represents  the  classification  of  the  elementary  fact. 

Since  functional  dependencies  are  visible  at  all  levels,  a  multilevel  relation  (r,-,  should  satisfy 
the  polyinstantiation  security  property: 

4  for  every  pair  of  tuples  t,t'  G  r,-  and  level  l  where  l  G  K,(t)  and  l  G  t[Ki]  =  t  [Ki\  implies 

t  =  t'. 

In  other  words,  tuples  labeled  at  the  same  level  satisfy  all  the  functional  dependencies.  A  multilevel 
database  (6,  k)  satisfies  the  polyinstantiation  security  property  if  every  multilevel  relation  in  (&,«) 
does.  Through  the  interpretation  policy,  polyinstantiation  security  properties  are  easily  mapped 
to  labeling  constraints.  For  example,  the  polyinstantiation  security  property  over  relation  scheme 
MT  of  Figure  11.1  and  the  lattice  of  Figure  10.1  is  mapped  to: 

(V/  G  £)(Vx,  y,  z)( MT'C®,  y)  A  MT'(j,  z)  -  y  =  z). 

Since  a  referential  dependency  i2,-[y]  t— ►  Rj  represents  a  relationship  between  tuples  in  multilevel 
relations,  and  knowing  a  relationship  between  two  tuples  requires  knowing  the  two  tuples  first, 
multilevel  relations  (r, •,«,■)  and  (rj,Kj)  should  satisfy  the  referential  security  property: 

5  for  every  tuple  t  G  r,-  and  level  l  G  there  is  a  tuple  t1  G  r j  and  a  level  l  G  Kj(t  )  such  that 

t[Y]  =  t'[Kj]  and  l'  l- 

In  other  words,  the  label  of  every  foreign  key  tuple  dominate  the  label  of  the  primary  key  tuple 
it  refers  to.  A  multilevel  database  (6,  k)  satisfies  the  referential  security  property  if  every  pair  of 
multilevel  relations  involved  in  a  referential  dependency  does.  The  multilevel  database  of  Figure  12.1 
satisfies  polyinstantiation  and  referential  security  properties.  Through  the  interpretation  policy, 
referential  security  properties  are  also  easily  mapped  to  labeling  constraints.  For  example,  the 
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referential  security  property  over  the  schema  of  Figure  11.1  and  the  lattice  of  Figure  10.1  is  mapped 
to: 

(VZi  G  £)(Vx,y,z)(SMDli(x,y,  z)  ->  (3 12  G  £)(3w)(/2  ■<  h  A  MT,2(y,  in))). 

12.2  Element-Level  Labeling 

Let  (B,£)  be  a  multilevel  schema,  where  B  =  (72., tJ),  72  =  {.Ri[X,-,X;]}i<t<n,  and  £  =  (L,<). 
A  multilevel  relation  with  element-level  labeling  over  multilevel  relation  scheme  (i?,[X,-,iift-],£)  is  a 
pair  (r,',0,),  where  r;  is  a  relation  over  i£,[X,-,  Ki]  and  0t  is  a  mapping  from  attributes  in  X,  and 
tuples  over  X,-  to  sets  of  levels  in  L ,  such  that  #,(A,  i)  —  {}  if  and  only  if  t  ^  r,  or  t[A\  =  Z1,  and 
l  G  0i(A,t)  if  t[A\  is  labeled  at  /  G  L.  When  0,(A,f)  =  6i{A',t )  for  ail  A,  A'  G  X,-,  we  denote  dx(A,t) 
by  6i(Xi,t). 

A  multilevel  database  with  element-level  labeling  over  multilevel  database  schema  ( B ,  £)  is 
a  family  {(r,-,  0t)}i<t<n,  where  (r,,  6%)  is  a  multilevel  relation  with  element-level  labehng  over 
(i£,[X,-,  Ki],  £).  We  denote  it  by  the  pair  (6,0),  where  b  =  {r,}i<,<n  is  a  database  over  B,  and 
0  =  { $, } i < ; < n  is  a  family  of  mappings.  Figure  12.2  shows  a  multilevel  database  over  the  schema 
of  Figure  "l  1.1  and  the  security  lattice  of  Figure  10.1.  The  labels  of  every  element  are  listed  as  a 
superscript  of  that  element. 
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Figure  12.2:  A  Multilevel  Database  with  Element-Level  Labeling 

As  we  observed  in  Chapter  9,  the  interpretation  policy  of  element-level  labeling  is  problematic. 
Since  elements  in  tuples  of  a  database  do  not  have  direct  correspondence  to  elementary  facts  in  a 
state  of  the  world,  it  is  unclear  what  the  correspondence  is  between  the  label  of  an  element  in  a 
tuple  and  the  classification  of  any  elementary  fact.  To  formulate  a  natural  interpretation  policy 
and  the  necessary  security  properties  of  element-level  labeling,  we  utilize  the  concept  of  atomic 
decomposition  from  Section  11.2. 

Let  ( b,6 )  be  a  multilevel  database  with  element-level  labeling  over  (B,  £).  Consider  the  atomic 
decomposition  Ba  of  B  and  the  atomic  decomposition  ba  of  b.  Notice  that  both  b  and  ba  capture 
exactly  the  same  elementary  facts,  and  every  elementary  fact  captured  in  b  is  a  conjunction  of 

‘Null  values  are  not  labeled,  which  is  natural  because  they  do  not  represent  elementary  facts  in  a  state  of  the 
world. 
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several  elementary  facts  captured  in  ba.  Hence  we  define  the  interpretation  policy  of  (6, 9)  by  the 
interpretation  policy  of  (6°,k)  —  a  multilevel  database  with  tuple-level  labeling  over  (B  ,  C).  In 
particular,  for  every  relation  r,-  £  b ,  tuple  t  £  r,-,  and  attribute  A  €  X;: 

f  Kf(f[x,])  if  A  e  Ki 
6i(A,t)  =  {  KY(t[KiY ])  if  A  £  Y  and  t[F]  f  L 
{  K?(t[KtA})  \it[A)^L 

Informally  the  interpretation  policy  says  that  the  labels  on  primary  keys  classify  their  existence, 
and  the  labels  on  foreign  keys  and  nonkey  attribute  values  classify  their  relationships  with  primary 
keys.  From  this  definition,  we  derive  the  key  classification  property  of  element-level  labeling: 

6.1  for  every  tuple  t  £  r,-  and  attributes  A,  A'  G  Ki,  0i(A,t)  =  9i(A',t),  and 

6.2  for  every  tuple  t  £  r,-,  foreign  key  Y  in  Ri,  and  attributes  A,  A!  G  Y ,  6i(A,t)  =  0i(A  ,t). 

In  other  words,  primary  and  foreign  keys  are  labeled  uniformly. 

From  the  polyinstantiation  security  property  of  tuple-level  labeling,  we  know  that  t[Ki]  =  t  [Ki] 
and  «f(t)  =  nf(t')  implies  t  =  t'  for  all  t,t'  £  rf .  Similarly  t[K,]  =  t'[Ki]  and  nf(t)  =  Kf(t') 
implies  t  =  t'  for  all  t,  t'  £  rf.  Hence  we  derive  the  polyinstantiation  security  property  of  element- 
level  labeling: 

7  for  every  pair  of  tuples  t,t'  £  r;  and  attribute  A  £  Xi  —  Ki,  we  have  that  t[Kt]  =  t'[Ki],0i(Ki,t)  = 

6i{Ki,t'),  and  6,(A,t )  =  6i{A,t')  implies  t[A]  =  t'[A]. 

In  other  words,  foreign  keys  or  nonkey  attribute  values,  which  are  labeled  at  the  same  level  and 
correspond  to  the  same  primary  keys,  are  identical. 

From  the  referential  security  property  of  tuple-level  labeling  and  referential  dependencies 
RY[Ki]  «-♦  R?,R?[Ki]  ^  Rf  in  Ca,  we  know  that  t[Ki]  =  t'  implies  nf{t')  <  kJ (t)  for  all 
t  e  ry,t'  £  rf  -  and  t[K{]  =  t'  implies  Kf(t')  <  K?(t)  for  all  t  £  rf,t'  £  rf.  Hence  we  derive  the 
primary  key  security  property  of  element-level  labeling: 

8  for  every  tuple  t  £  r,-  and  A  £  Xi  —  Ki  where  t[A]  ^  L,  6i(Ki,t)  0i(A,t). 

In  other  words,  primary  keys  are  dominated  by  foreign  keys  and  nonkey  attribute  values. 

Again  from  the  referential  security  property  of  tuple-level  labeling  and  the  referential  depen¬ 
dency  RY[Y]  ^  Rf  in  Ca,  we  know  that  t[Y]  =  t'  implies  nf(t')  <  kJ  ( t )  for  all  t  £  rf  ,t'  £  rf. 
Hence  we  derive  the  foreign  key  security  property  of  element-level  labeling  for  every  referential 
dependency  i2,[F]  Rj  in  C : 

9  for  every  tuple  t  £  r,-  where  t[Y]  ^  L ,  there  is  t'  £  rj  such  that  t[y]  =  t'[Kj\  and  6j(Kj,t )  X 

WO- 

In  other  words,  the  label  of  every  foreign  key  value  dominates  the  label  of  the  primary  key  value  it 
refers  to. 

All  properties  defined  in  this  section  have  been  identified  in  the  literature  as  desirable,  indicat¬ 
ing  that  our  interpretation  policy  for  multilevel  databases  with  element-level  labeling  is  natural.  In 
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particular,  our  properties  2.1,  6.1,  and  8  together  form  the  entity  integrity  as  defined  in  [22,  29]. 
Hence  our  definition  of  the  interpretation  policy  of  element-level  labeling  provides  a  semantic  jus¬ 
tification  of  entity  integrity.  With  the  natural  requirement  that  null  values  are  not  labeled,  our 
property  7  is  equivalent  to  the  PI-FD  property  of  [44].  Our  properties  3.1,  6.2,  and  9  together  pro¬ 
vide  a  formal  definition  and  semantic  justification  of  referential  integrity  in  the  multilevel  relational 
model  with  element-level  labeling. 

Figure  12.3  shows  a  multilevel  database  with  tuple-level  labeling,  over  the  schema  of  Figure  11.2. 
It  is  semantically  equivalent  to  the  multilevel  database  of  Figure  12.2,  because  the  two  are  mapped 
to  the  same  multilevel  theory  and  structure  according  to  our  interpretation  policies.  The  null  value 
in  Figure  12.2  has  disappeared  in  Figure  12.3. 
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Figure  12.3:  An  Atomic  Decomposition  of  Database 


12.3  Design  Trade- Off 

From  our  interpretation  policies  of  tuple-level  and  element-level  labeling,  we  can  conclude  that 
tuple-level  and  element-level  labeling  mechanisms  have  exactly  the  same  expressive  power,  because 
for  every  multilevel  database  with  element-level  labeling,  we  can  find  a  multilevel  database  with 
tuple-level  labeling  that  captures  exactly  the  same  information  in  a  multilevel  state  of  the  world, 
and  vice  versa. 

But  trade-off  does  exist  between  tuple-level  and  element-level  labeling  mechanisms  when  design¬ 
ing  a  multilevel  database.  On  one  hand,  element-level  labeling  is  more  complicated  than  tuple-level 
labeling,  since  labels  are  attached  to  elements  rather  than  tuples.  On  the  other  hand,  tuple-level 
labeling  requires  more  complicated  schemas  to  capture  the  same  amount  of  information  as  element- 
level  labeling,  making  query  and  update  more  expensive  because  the  same  elementary  fact  captured 
by  one  tuple  with  element-level  labeling  is  captured  by  several  tuples  with  tuple-level  labeling. 

To  simplify  discussions,  we  restrict  ourselves  to  multilevel  databases  with  tuple-level  labeling 
in  the  rest  of  the  report.  Because  of  the  equivalence  of  expressive  power  between  tuple-level  and 
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element-level  labeling  mechanisms,  the  results  can  be  easily  generalized  to  multilevel  databases 
with  element-level  labeling. 
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Chapter  13 

View  Policy 


A  view  policy  consists  of  a  set  of  view  constraints  and  a  specification  of  the  validity  of  view  con¬ 
straints.  This  policy  specifies  the  upward  information  flow  requirements  for  the  view  constraints, 
which  indicates  the  believability  of  low  data  at  high  levels. 


13.1  Sample  View  Policies 

According  to  the  Bell-LaPadula  model,  low  data  are  always  visible  at  high.  However,  since  low 
data  could  contradict  high  data,  visibility  should  be  distinguished  from  believability. 

The  filter  function  [21,  29]  and  the  security  logic  [16]  proposed  in  the  literature  take  one  extreme 
position  by  equating  believability  to  visibility,  thus  maximizing  believability.  However,  integrity 
is  compromised  if  a  low  tuple  contradicts  some  high  tuples  with  respect  to  the  constraints,  which 
leads  to  an  invalid  high  database.  For  example,  consider  the  following  multilevel  relation  over  the 
schema  of  Figure  11.1  and  the  lattice  of  Figure  10.1: 


Starship 

Mission 

Destination 

Enterprise 

L 

Talos 

Enterprise 

102 

Rigel 

Enterprise 

103 

Rigel 

When  querying  the  mission  of  Enterprise  at  level  T,  users  will  get  back  both  102  and  103,  which 
contradicts  the  constraint  “starships  have  unique  missions”. 

Smith  and  Winslett  proposed  a  belief-based  semantics  of  the  multilevel  relational  model  [53], 
which  defines  a  multilevel  relational  database  as  a  set  of  unrelated  single-level  relational  databases, 
one  for  every  level.  They  made  a  clear  distinction  between  visibility  and  believability,  and  took 
the  other  extreme  position  by  allowing  no  low  tuples  to  be  believable  at  high,  thus  minimizing 
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believability.  Their  semantics  serves  as  a  nice  framework  within  which  other  semantics  could  be 
compared.  However  a  multilevel  relational  database  that  directly  employs  their  semantics  would 
no  longer  be  multilevel  —  it  would  be  a  set  of  single-level  relational  databases  in  which  there  is  no 
upward  information  flow  across  levels.  For  example,  consider  the  following  multilevel  relation  over 
the  schema  of  Figure  11.1  and  the  lattice  of  Figure  10.1: 


Starship 

Mission 

Destination 

Enterprise 

102 

Rigel 

When  querying  the  mission  of  Enterprise  at  level  T,  users  will  get  back  an  empty  answer,  because 
no  information  about  Enterprise  is  considered  believable  at  that  level. 

Thuraisingham  first  formalized  the  distinction  between  visibility  and  believability  by  a  proof- 
theoretic  semantics  of  the  multilevel  relational  model  [55],  which  consists  of  a  nonmonotonic  infer¬ 
ence  rule  stating  that  low  data  are  believable  at  high  as  long  as  they  do  not  contradict  high  data. 
Given  two  low  tuples  labeled  incomparably,  what  happens  if  either  tuple  does  not  contradict  high 
data,  but  their  combination  does?  To  determine  what  is  believable  at  high,  the  result  of  Thurais- 
ingham’s  approach  would  depend  on  the  (random)  order  in  which  the  nonmonotonic  inference  rule 
is  applied  to  these  two  tuples,  which  introduces  ambiguity.  For  example,  consider  the  following 
multilevel  relation  over  the  schema  of  Figure  11.1  and  the  lattice  of  Figure  10.1: 


Starship 

Mission 

Destination 

Enterprise 

102 

Rigel 

Enterprise 

103 

Talos 

When  querying  the  mission  of  Enterprise  at  level  T,  users  will  get  back  either  102  or  103  but 
not  both.  It  should  be  noticed  that  such  problems  occur  even  with  a  totally  ordered  security  lattice, 
if  we  allow  arbitrary  constraints.  For  example,  a  constraint  could  state  that  there  should  be  no 
more  than  two  starships  going  to  Rigel.  If  we  have  one  high  tuple  (Enterprise,  101,  Rigel)  together 
with  two  low  tuples  (Voyager,  102,  Rigel)  and  (Discovery,  103,  Rigel),  then  at  most  one  low  tuple 
is  believable  at  high,  but  it  is  unclear  which  one  should  be. 

In  the  rest  of  this  chapter,  we  develop  a  view  policy  for  multilevel  databases  with  tuple-level 
labeling  where  the  view  constraints  consist  of  key-based  functional  and  referential  dependencies, 
which  overcomes  the  above-mentioned  problems. 


13.2  Validity  and  Views 

Intuitively  a  database  represents  one  view  of  a  state  of  the  world  (perceived  world  [35]),  while  a 
multilevel  database  represents  multiple  views  of  a  multilevel  state  of  the  world  one  for  every 
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level.  Furthermore,  these  multiple  views  are  related  by  the  security  lattice.  Contained  in  the  view 
at  a  level  are  tuples  believable  at  that  level.  Consequently  integrity  should  be  enforced  within  each 
view,  rather  than  across  multiple  views. 

What  tuples  belong  to  the  view  at  a  level?  First,  all  tuples  labeled  at  that  level  should  be  part 
of  the  view.  Second,  for  tuples  labeled  at  a  lower  level,  as  many  of  them  as  possible  should  be 
part  of  the  view  as  long  as  integrity  is  preserved,  in  order  to  maximize  sharing.  Third,  in  case  that 
either  but  not  both  of  two  low  tuples  could  be  in  a  high  view,  neither  of  them  should  be  in  the 
high  view,  because  the  high  view  lacks  further  information  to  justify  the  preference  of  one  over  the 
other.  In  other  words,  view  constraints  serve  as  a  filter  on  how  much  low  data  could  flow  high. 

Formally,  recall  from  Section  12.1  that  a  multilevel  schema  is  a  pair  (B,C),  where  B  =  (71,  C), 
71  -  {Ri[Xi,Ki\}i<i<n,  and  C  =  ( L,< ).  Let  (/>,«)  =  {(r,-,  Kj)}i<*<»  be  a  multilevel  database  over 
(B,C),  and  l  £  L  be  a  level. 

The  l- slice  of  multilevel  relation  (r,-,  k,),  denoted  as  07(77,  k,),  is  a  subrelation  of  r,-  defined  as 
{t\t  €  r,-  A  /  G  Ki(t)}.  The  l-slice  of  multilevel  database  (b,  k),  denoted  as  07(6,  k),  is  a  subdatabase 
of  b  defined  as  {<7/(77,  «.)}i<i<n-  The  /-slices  07 (r,,  «,•)  and  <t;(/>,k)  are  respectively  a  single-level 
relation  and  database  collecting  all  tuples  in  77  and  b  that  are  labeled  at  l.  The  T-slice  of  the 
multilevel  database  of  Figure  12.1  consists  of  the  first  tuple  in  SMD  and  no  tuples  of  MT. 

The  l-validity  of  (b,  k)  and  the  l-view  of  (b,  k),  denoted  as  q(b,  k)  =  {q(t7,  «t)}i<«<n  where 
<7(77,  Ki)  is  the  l-view  of  (77, k,),  are  defined  recursively  as  follows. 

•  Suppose  that  /  is  the  bottom  level  of  C.  Define  ?/(6,  k)  —  ai(b,n).  If  a i(b,n)  is  valid,  then 
(. b ,  k)  is  /-valid.  Otherwise  ( b ,  k)  is  not  /-valid. 

•  Suppose  that  /  is  not  the  bottom  level  of  £.  Let  bn  be  07(6,  k)  and  /»£,  be  ®i>^iqi(b,  k).  Also 
let  bu  be  bn  ©  &z,-  Define  <7(6,  k)  =  bu ■  If  bv  is  valid  then  ( b,n )  is  /-valid.  Otherwise  (/>,«)  is 
not  /-valid. 

Multilevel  database  (6,  k )  is  valid  if  it  is  /-valid  for  every  level  /  £  L.  Notice  that  the  /-views  of 
(r{,Ki)  and  (b,  k)  are  respectively  a  single-level  relation  and  database. 

The  above  definition  formalizes  our  intuition  about  views.  All  atomic  tuples  labeled  at  level 
/  are  part  of  the  /-view  since  bfj  E  <7(6,  k)  according  to  Theorem  11.1.  For  atomic  tuples  labeled 
below  /,  as  many  of  them  as  possible  are  part  of  the  /-view  since  they  are  part  of  bjj  according  to 
Theorem  11.1.  In  case  that  either  but  not  both  of  two  atomic  tuples  labeled  below  /  could  be  in 
the  /-view,  neither  is  in  the  /-view  since  neither  would  be  in  f>£  according  to  Theorem  11.2.  The 
multilevel  database  of  Figure  12.1  is  valid,  and  its  T-view  is  shown  in  Figure  13.1. 

Our  view  policy  coincides  with  the  Bell-LaPadula  model  for  primary  key  values,  in  the  sense 
that  all  low  primary  key  values  are  part  of  the  high  view.  In  the  extreme  case  that  there  are  no 
view  constraints  (i.e.,  X,-  =  A,-  for  1  <  i  <  n  and  C  is  empty),  our  view  policy  completely  coincides 
with  the  Bell-LaPadula  model,  in  the  sense  that  all  low  tuples  are  part  of  the  high  view,  since 

?/(&,«)  =  U 

Our  view  policy  is  an  extension  of  the  Bell-LaPadula  model,  in  the  sense  that  we  distinguish 
between  two  kinds  of  low  data:  those  that  are  believable  at  high  (e.g.,  the  atomic  tuple  (Enterprise, 
Rigel)),  and  those  that  are  visible  but  not  believable  at  high  (because  they  violate  view  constraints 
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Figure  13.1:  A  T-View 


when  combined  with  high  data)  (e.g.,  the  atomic  tuple  (Enterprise,  102)).  Integrity  is  enforced 
only  on  believable  low  data. 

Let  fi  be  the  number  of  levels  immediately  dominated  by  /,  and  /  be  max{//|/  E  L}.  For 
multilevel  relation  (r,-,  k;),  the  size  of  its  /-slice  c/(r,-,Kj)  and  /-view  «,■)  is  bounded  by  0(|r,|). 
The  cost  of  computing  ?/(r,-,/c,)  for  bottom  level  is  bounded  by  0(\L\  x  |r;|).  For  nonbottom  level 
/,  the  cost  of  computing  6#  is  bounded  by  0(\L\  x  |r,|),  the  cost  of  computing  &£,  is  bounded  by 
0(|r;|-^),  and  the  cost  of  computing  bu  is  bounded  by  0(|r,|2).  Assuming  that  \L\  <C  !'?”» |  and  2</, 
the  cost  of  computing  «,)  is  bounded  by  0{\L\\  x  |r,-|^),  where  X;  is  the  set  of  levels  dominated 
by  /• 


13.3  Validity  Checking 

Given  a  multilevel  database,  straightforward  validity  checking  based  on  the  recursive  definition  of 
validity  is  likely  to  be  expensive,  because  it  involves  computing  views  for  all  levels  and  checking 
their  validity.  Luckily,  multilevel  validity  could  be  equivalently  characterized  by  multilevel  security 
properties,  whose  computation  is  comparable  in  complexity  to  integrity  checking  in  single- level 
databases. 


Theorem  13.1  A  multilevel  database  is  valid  if  and  only  if  it  satisfies  polyinstantiation  and  refer¬ 
ential  security  properties. 


Theorem  13.1  tells  us  that  view  computation  is  not  necessary  for  validity  checking.  For  ex¬ 
ample,  the  multilevel  database  of  Figure  12.1  is  valid  because  it  satisfies  polyinstantiation  and 
referential  security  properties.  Furthermore,  validity  checking  in  multilevel  databases  is  compara¬ 
ble  in  complexity  to  that  in  single-level  databases.  The  cost  of  checking  polyinstantiation  security 
in  multilevel  relation  (r,-,  k,)  is  bounded  by  0(|L|2  x  |r\|2),  while  the  cost  of  checking  key  integrity  in 
single-level  relation  r,-  is  bounded  by  0(|r\|2).  For  Ri[Y]  *-*  Rj  in  C,  the  cost  of  checking  referential 
security  in  multilevel  database  (6,  k)  is  bounded  by  0(|X|3  X  |r,|  X  |rj|),  while  the  cost  of  checking 
referential  integrity  in  single-level  database  b  is  bounded  by  0(|r,|  X  |rj|). 
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13.4  Validity  Enforcement 


According  to  the  Bell-LaPadula  model,  low  updates  could  only  affect  views  at  comparably  higher 
levels.  For  the  classes  of  constraints  in  our  multilevel  relational  model,  a  low  insertion  will  not 
invalidate  any  high  views,  because  the  inserted  low  tuple  could  either  cause  other  low  tuples  to  be 
removed  from  a  high  view,  or  be  excluded  from  a  high  view  by  some  high  tuples.  The  only  situation 
in  which  a  low  deletion  will  invalidate  a  high  view  is  when  the  deleted  low  tuple  is  referred  to  by 
some  high  tuple  through  a  referential  dependency. 

Intuitively,  if  a  low  deletion  invalidates  a  high  view,  it  should  not  be  aborted.  Instead  it  should 
be  extended  with  necessary  compensating  updates  in  order  to  enforce  both  integrity  and  secrecy. 
Not  all  compensating  updates  are  acceptable.  They  should  have  at  least  the  following  three  natural 
properties: 

1.  The  compensating  updates  should  be  at  comparably  high  levels,  because  of  the  ^-property  of 
the  Bell-LaPadula  model. 

2.  The  compensating  high  updates  should  not  cause  a  loss  of  high  data.  In  other  words,  a  low 
deletion  should  only  be  extended  with  high  insertions. 

3.  The  amount  of  high  data  added  through  compensating  high  insertions  should  be  minimized. 

Let  (6,  k)  =  {(r,-,  K,)}i<,<n  be  a  valid  multilevel  database  over  multilevel  schema  (B,  £),  where 
B  =  (7 Z,C),  1Z  =  {.ft, -[X,-,  if, •]}!<,■<„,  and  £  =  (L,<).  Also  let  /  E  L  be  a  level  and  t  be  a  tuple 
over  X{.  An  update  at  level  l  has  the  form  insert j(t)  or  delete*- (/),  which  specifies  respectively  the 
insertion  to  or  deletion  from  multilevel  relation  ( rj ,  kj)  of  tuple  t  at  level  /. 

Let  op j(t)  be  an  update  at  level  /  where  op  is  either  insert  or  delete.  Define  (&',  k')  = 
{(rt-,  KD}i<»'<n  to  be  a  multilevel  database  over  (B,£),  where  (r',  k'-)  =  (r,-,  k,)  for  all  i  ^  j,  and 
k '■{£)  =  Kj(t')  for  all  t'  7^  t.  Define  r'-  and  «'(f)  as  follows. 

1.  op  =  insert 

(a)  r'j  —  rj  U  {/},  and 

(b)  K'j(t)  =  Kj(t)U{l}. 

2.  op  =  delete 

(a)  Kj(t)  =  Kj(t )  -  {/},  and 

(b)  r'j  =  rj  -  {f}  if  Kj(t)  =  {/}. 

The  update  op lj(t)  applied  to  multilevel  database  (6,  k)  is  correct  if  and  only  if  multilevel 
database  ( b',n ')  is  /-valid.  If  the  update  is  correct,  then  its  result  is  a  multilevel  database 
(b,k)  =  {(f,-,  K,)}i<,<n  defined  as  follows.  Let  t'  be  the  tuple  over  Xj  where  t'[Kj]  =  t[Kj]  and 
t'[Xj-Kj]  =  L 

1.  If  op  =  insert,  then  (6,  k)  =  ( b',K '). 
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2.  If  op  =  delete,  then  (ft-,Kt)  =  (r ■,  k')  for  all  i  /  j,  and  Kj(i)  =  Kj(f)  f°r  all  t  ^  t'.  Let  L  be 
the  set  of  levels  V  £  L  such  that  /  -<*  /',  ( b ',  k')  is  not  /'- valid,  and  (&',  k')  is  /-valid  for  every 
/  €  L  where  l  <*  l  /'. 

(a)  If  L1  -  {},  then  (6,k)  =  (&',«/)• 

(b)  If  V  ±  {},  then  fj  =  r '■  U  {?}  and  Kj(i')  =  «'•(/')  U  L' . 

For  example,  deleting  the  first  two  MT  tuples  consecutively  (in  either  order)  from  the  multilevel 
database  of  Figure  12.1  would  lead  to  inserting  the  MT  tuple  (101,  Z)  at  T. 

Theorem  13.2  If  the  update  is  correct,  then  the  result  of  the  update  is  valid. 

Theorem  13.2  tells  us  that  the  extended  update  preserves  the  validity  of  multilevel  databases. 
It  is  also  secure  because  correct  low  deletions  will  not  be  aborted. 

The  extended  update  also  satisfies  the  three  properties  identified  earlier,  for  the  following  rea¬ 
sons. 

1.  The  compensating  updates  are  at  comparably  high  levels.  For  any  level  l'  where  l  ^  l ,  the 
/'-view  after  an  update  at  /  remains  the  same.  The  /-view  after  an  update  at  /  differs  from 
that  before  the  update  precisely  in  the  effect  of  the  update. 

2.  The  compensating  high  updates  do  not  cause  a  loss  of  high  data.  For  any  level  /  where 
/  -<*  /',  the  necessary  compensating  high  insertions  at  /'  are  performed  to  restore  referential 
security,  if  it  is  violated  by  a  deletion  at  level  /. 

3.  The  amount  of  high  data  added  through  compensating  high  insertions  is  minimized.  First, 
for  any  level  /'  where  /  -<*  /',  compensating  insertions  are  performed  at  /'  only  if  a  deletion  at 
level  /  invalidates  the  /'-view.  Second,  the  number  of  levels  at  which  compensating  insertions 
are  performed  is  minimized  by  the  definition  of  L' .  Third,  the  information  content  of  the 
tuple  to  be  inserted  by  compensating  insertions  is  minimized  by  the  definition  of  /'. 

It  is  easy  to  see  that  view  computation  is  not  necessary  for  validity  enforcement.  When  delet¬ 
ing  tuple  t  labeled  at  level  /  in  relation  r,-,  the  set  of  tuples  t  that  refer  to  t  through  referential 
dependencies  could  be  computed  as  a  by-product  of  checking  the  referential  security  property.  The 
set  of  minimal  levels  /'  of  t  such  that  /  -<  /',  namely  L',  is  the  set  of  levels  at  which  the  primary  key 
value  of  t,  namely  t',  needs  to  be  inserted. 
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Chapter  14 

Update  Policy 


An  update  policy  consists  of  a  set  of  labeling  constraints,  a  set  of  updates,  and  a  specification  of  the 
enforcement  of  labeling  constraints  in  performing  the  updates.  This  policy  specifies  the  mechanisms 
to  eliminate  inference  channels  in  the  enforcement  of  labeling  constraints. 

14.1  Sample  Update  Policies 

We  consider  the  restricted- value  policy  of  [43]  and  the  insert-low  policy  of  [62],  both  of  which  are 
designed  to  eliminate  inference  channels  in  the  enforcement  of  the  no-polyinstantiation  constraint. 
For  easy  presentation,  we  adapt  these  policies  to  the  context  of  multilevel  databases  with  tuple-level 
labeling.  The  no-polyinstantiation  constraint  states: 

Two  distinct  tuples  cannot  have  identical  primary  key  values. 

If  low  users  insert  a  tuple  which  has  the  same  primary  key  value  as  an  existing  high  tuple,  then 
either  the  low  insertion  has  to  be  rejected,  leading  low  users  to  infer  the  existence  of  the  high  tuple, 
or  the  high  tuple  has  to  be  overwritten,  causing  a  loss  of  high  data.  Similarly,  if  high  users  insert  a 
tuple  which  has  the  same  primary  key  value  as  an  existing  low  tuple,  then  either  the  low  tuple  has 
to  be  deleted,  leading  low  users  to  infer  the  existence  of  the  high  tuple,  or  the  high  insertion  has 
to  be  rejected,  causing  a  loss  of  high  data. 

The  example  below  illustrates  how  the  restricted-value  policy  removes  this  dynamic  inference 
channel  in  the  no-polyinstantiation  constraint.  Consider  the  following  multilevel  relation  over  the 
schema  of  Figure  11.1  and  the  lattice  of  Figure  10.1: 


Starship 

Mission 

Destination 

Enterprise 

102 

Rigel 

When  users  try  to  replace  102  by  101  at  level  T,  the  update  is  extended  to: 
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1.  Replace  102  by  yj  at  level  J_. 

2.  Insert  (Enterprise,  101,  Rigel)  at  level  T. 

The  extended  update  ensures  no-polyinstantiation  at  the  price  of  introducing  a  (partial)  static 
inference  channel,  because  users  at  level  _L  can  infer  from  the  restricted-value  that  Enterprise 
has  a  high  mission.  Moreover,  the  high  update  is  extended  with  a  low  insertion,  which  is  against 
the  spirit  of  the  ^-property  of  the  Bell-LaPadula  model. 

The  example  below  illustrates  how  the  insert-low  policy  removes  this  dynamic  inference  channel 
in  the  no-polyinstantiation  constraint.  Consider  the  following  multilevel  relation  over  the  schema 
of  Figure  11.1  and  the  lattice  of  Figure  10.1: 


Starship 

Mission 

Destination 

Enterprise 

101 

Rigel 

When  users  try  to  insert  tuple  (Enterprise,  102,  Rigel)  at  level  J_,  the  update  is  extended  to: 

1.  Delete  (Enterprise,  101,  Rigel)  at  level  T. 

2.  Insert  (Enterprise,  102,  Rigel)  at  level  ±. 

The  extended  update  ensures  no-polyinstantiation  at  the  price  of  losing  high  data. 

In  the  rest  of  this  chapter,  we  characterize  the  desirable  properties  of  update  policies  for  multi¬ 
level  databases  with  tuple-level  labeling,  which  do  not  suffer  from  the  above-mentioned  problems. 

14.2  Polarity  and  Force 

For  easy  presentation  of  our  results,  we  borrow  two  standard  syntactic  notions  in  first-order  logic 
from  [31].  For  every  formula  a,  we  assign  a  polarity  to  every  subformula  in  a,  which  is  either  positive 
or  negative.  The  polarity  of  a  subformula  in  a  provides  a  syntactic  indication  as  to  how  the  truth 
of  the  subformula  relates  to  the  truth  of  a.  The  polarity  of  any  subformula  in  a  is  determined  by 
the  following  rules: 

1.  a  has  positive  polarity. 

2.  If  a  has  the  form  ->/?,  then  0  has  polarity  opposite  to  a. 

3.  If  a  has  the  form  0  A  7  or  0  V  7,  then  0  and  7  have  the  same  polarity  as  a. 

4.  If  a  has  the  form  0  — *■  7,  then  0  has  polarity  opposite  to  a  and  7  has  the  same  polarity  as  a. 

5.  If  a  has  the  form  (Vx)/3  or  (3x)0,  then  0  has  the  same  polarity  as  a. 
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Based  on  the  polarities  of  subformulas  in  a,  we  assign  &  force  to  every  quantifier  in  a,  which  is 
either  universal  or  existential.  The  force  of  a  quantifier  in  a  gives  a  syntactic  indication  of  what 
role  the  quantifier  has  towards  the  truth  of  a.  For  a  subformula  /3  in  a  of  the  form  (Vx)7  or  (3x)7, 
the  force  of  the  outmost  quantifier  in  (3  is  determined  by  the  following  rules: 

1.  The  quantifier  has  universal  force  if 

(a)  it  is  a  universal  quantifier  and  (3  has  positive  polarity;  or 

(b)  it  is  an  existential  quantifier  and  j3  has  negative  polarity. 

2.  The  quantifier  has  existential  force  if 

(a)  it  is  an  existential  quantifier  and  f3  has  positive  polarity;  or 

(b)  it  is  a  universal  quantifier  and  (3  has  negative  polarity. 

14.3  Labeling  Constraints 

Now  we  are  ready  to  define  labeling  constraints  in  the  multilevel  relational  model.  Level  expressions 
are  defined  recursively  as  follows: 

•  Every  level  l  in  C  is  a  level  expression. 

•  Given  level  expressions  l\  and  1%,  l\  U  Z2  and  l\  n  /2  are  level  expressions  denoting  respectively 
the  least  upper  bound  and  the  greatest  lower  bound  of  l\  and  Z2. 

Given  level  expressions  l\  and  Z2  where  l\  <  Z2,  a  lattice  expression  has  the  form  £{2 ,  denoting 
the  sublattice  of  £  whose  bottom  and  top  levels  are  l\  and  Z2  respectively. 

Recall  from  Section  12.1  that  a  multilevel  schema  is  a  pair  (5,£),  where  B  =  (7£,£),  ^  = 
{ jf£t‘] } x < t' < tx j  and  £  =  (L,  ■<).  A  labeling  constraint  is  a  sentence  in  many-sorted  first-order 
predicate  calculus.  There  is  a  domain  sort  and  a  level  sort.  Predicate  symbols  include  domain 
equality  and  Rf  where  x  is  a  level  variable.  Non-equality  atomic  formulas  are  relational  formulas. 
Quantifiers  of  the  level  sort  have  the  form  ( Qx  €  L ),  where  Q  is  either  V  or  3,  and  L  is  a  lattice 
expression  whose  bottom  level  is  either  1  or  a  variable  with  _L  as  the  default,  and  whose  top  level  is 
either  T  or  a  variable  with  T  as  the  default.1  A  labeling  constraint  over  the  schema  of  Figure  11.1 
and  the  lattice  of  Figure  10.1  might  be: 

(V/  e  £)(Vx,  y,  «)(MT'(*,  y)  A  MT'(i,  z)  -»  y  =  z)  (14.1) 

which  states  the  polyinstantiation  security  property  of  MT,  namely  MT  tuples  labeled  at  the  same 
level  have  unique  Missionld  values.  Another  labeling  constraint  over  the  same  schema  and  lattice 
might  be: 

(V/i  e  £)(Vx,  y,  2)(SMD(l  (x,  y,  z)  -►  (3 12  €  Ch  )(3tu)MTi2(y,  w))  (14-2) 

JWe  do  not  allow  arbitrary  level  constants  in  labeling  constraints.  This  makes  the  specification  of  constraints 
independent  of  specific  lattices.  It  also  simplifies  the  definition  of  /-validity  for  labeling  constraints  later.  Nevertheless, 
the  results  presented  here  could  easily  be  generalized  to  labeling  constraints  containing  arbitrary  level  constants. 
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which  states  the  referential  security  property  from  SMD  to  MT,  namely  every  SMD  tuple  refers  to 
an  MT  tuple  at  the  same  or  a  lower  level  [37]. 

A  labeling  constraint  a  is  single-level  if  it  has  the  form  (Vx  £  £)/3;  where  f 3  does  not  contain 
level  quantifiers.  For  example,  constraint  (14.1)  is  single-level,  while  constraint  (14.2)  is  not. 

Given  a  level  /  in  £,  the  l-instance  of  a  labeling  constraint  a  is  a[l/ T],  which  is  denoted  by  a  . 
Notice  that  the  T-instance  of  a  is  a,  and  the  /-instance  of  a  is  a  if  a  does  not  contain  the  level 
constant  T. 

Recall  from  Section  12.1  that  a  multilevel  database  over  multilevel  schema  ( B ,  £)  is  a  pair  ( b ,  k) 
where  b  =  {rJ^Kn  and  k  =  {k,}i<,<„.  Given  a  level  /  in  £,  a  labeling  constraint  a  is  l-valid  in 
(6,k)  if  the  /- instance  of  a  is  true  in  the  first-order  structure  that  assigns  the  /'-slice  <7,,(r;,  k.)  to 
Rli  for  1  <  i  <  n  and  V  in  £,  which  is  denoted  as  (6,#e)  \=  a1.  A  labeling  constraint  a  is  valid  in 
(6,k)  if  a  is  T-valid  in  (/>,«).  For  example,  constraints  (14.1)  and  (14.2)  are  both  mi-valid  and 
valid  in  the  multilevel  database  of  Figure  12.1. 

Since  (/»,  k)  (=  a1  iff  (/>,  k)1  [=  a1  for  any  level  /  in  £,  the  /-validity  of  a  could  be  checked  at  /, 
but  not  at  any  level  lower  than  or  incomparable  to  /,  by  accessing  only  data  that  are  visible  at  /, 
namely  (b,  k)1.  Therefore,  if  a  contains  the  level  constant  T ,  then  its  validity  could  only  be  checked 
at  T. 

14.4  Static  Inference  Channels 

Static  inference  channels  are  found  in  a  particular  state  of  the  database,  and  depend  on  the  data 
and  labeling  constraints  true  in  that  state  of  the  database.  In  other  words,  from  low  data  combined 
with  labeling  constraints,  the  low  user  could  often  infer  some  high  information.  For  example, 
consider  a  labeling  constraint  which  requires  that  every  foreign  key  value  refers  to  a  primary  key 
value.  If  a  foreign  key  value  is  labeled  low  but  the  primary  key  value  it  refers  to  is  labeled  high,  then 
the  low  user  would  see  the  foreign  key  value  but  not  the  primary  key  value.  Hence,  the  existence  of 
the  primary  key  value  could  be  inferred  by  the  low  user  from  the  foreign  key  value  if  he  knows  about 
the  labeling  constraint.  Static  inference  channels  through  functional  and  multivalued  dependencies 
were  studied  in  [32,  54]. 

We  identify  common  classes  of  labeling  constraints  whose  enforcement  is  free  of  static  inference 
channels.  Intuitively,  a  static  inference  channel  exists  in  a  multilevel  database  if  high  information 
could  be  derived  from  low  tuples  together  with  (the  low  instances  of)  labeling  constraints.  In  other 
words,  low  tuples  combined  with  labeling  constraints  logically  imply  some  sentence  that  is  not  valid 
in  the  low  database.  Since  the  multilevel  database  is  known  to  be  valid  with  respect  to  labeling 
constraints,  the  low  user  could  infer  that  the  implied  sentence  must  be  valid  in  the  high  database. 

Theorem  14.1  If  all  level  quantifiers  in  a.  have  universal  force,  then  there  is  no  static  inference 
channel  in  a. 

As  a  consequence  of  Theorem  14.1,  there  is  no  static  inference  channel  in  single-level  labeling 
constraints.  For  example,  there  is  no  static  inference  channel  in  constraint  (14.1).  In  other  words, 
there  could  not  be  a  valid  MT  relation  containing  two  tuples  at  the  same  level  and  with  the  same 
Missionld  value. 
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Corollary  14.2  If  every  level  quantifier  in  a  with  existential  force  has  the  form  ( Qy  6  L)/3,  where 
L  is  a  lattice  expression  whose  top  level  is  not  T,  then  there  is  no  static  inference  channel  in  a. 

For  example,  there  is  no  static  inference  channel  in  constraint  (14.2)  according  to  Corollary  14.2. 
In  other  words,  there  could  not  be  a  valid  database  that  contains  an  SMD  tuple  not  referring  to 
any  lower  MT  tuple.  However,  if  we  relax  the  requirement  by  replacing  constraint  (14.2)  with  the 
following  labeling  constraint  (14.3)  instead,  which  states  that  every  SMD  tuple  refers  to  an  MT 
tuple: 

(V/i  €  £)(Va:,t/,2)(SMD'1(a;,?/,^)  -  (3 12  €  >C)(3u;)MT/2(2/,  w))  (14.3) 

then  there  are  static  inference  channels  at  l  for  every  l  ^  T.  If  a  low  SMD  tuple  does  not  refer  to 
any  low  MT  tuple,  then  it  must  refer  to  a  high  MT  tuple. 

A  natural  way  of  eliminating  static  inference  channels  in  labeling  constraint  a  is  by  requiring 
that  a  is  /-valid  for  every  level  /  in  C.  In  other  words,  if  labeling  constraints  are  enforced  at 
every  level,  then  there  is  no  static  inference  channel.  Hence  we  could  insist  that  constraint  (14.3) 
be  enforced  at  every  level,  namely  every  SMD  tuple  has  to  refer  to  a  visible  MT  tuple,  which  is 
equivalent  to  enforcing  constraint  (14.2). 

14.5  Dynamic  Inference  Channels 

Dynamic  inference  channels  are  found  in  a  particular  state  transition  of  the  database,  and  depend 
on  the  data  and  labeling  constraints  true  in  the  state  before  the  transition  as  well  as  the  behavior 
of  the  database  in  response  to  the  transition.2  In  other  words,  the  result  of  a  low  update  could 
violate  some  labeling  constraints  when  combined  with  high  data,  but  prohibiting  the  low  update 
would  enable  the  low  user  to  infer  the  existence  of  relevant  high  data.  For  example,  consider 
a  labeling  constraint  which  requires  that  every  foreign  key  value  refers  to  a  visible  primary  key 
value.  If  a  foreign  key  value  is  labeled  high  but  the  primary  key  value  it  refers  to  is  labeled  low, 
then  prohibiting  a  low  deletion  of  the  primary  key  value  would  enable  the  low  user  to  infer  the 
existence  of  the  high  foreign  key  value.  Dynamic  inference  channels  through  the  enforcement  of 
polyinstantiation  and  referential  integrity  were  studied  in  [7,  43]. 

We  identify  common  classes  of  labeling  constraints  whose  enforcement  is  free  of  dynamic  infer¬ 
ence  channels.  Intuitively,  a  dynamic  inference  channel  exists  in  a  multilevel  database  if  the  result 
of  a  low  update  is  not  valid  with  respect  to  labeling  constraints,  even  if  the  low  database  is  valid 
with  respect  to  (the  low  instances  of)  labeling  constraints  [7,  43].  In  other  words,  a  low  update 
results  in  a  valid  low  database  but  not  a  valid  high  database.  Since  the  multilevel  database  has  to 
be  valid,  the  low  update  has  to  be  prohibited,  thus  enabling  the  low  user  to  infer  the  existence  of 
relevant  high  data. 

Theorem  14.3  If  all  level  quantifiers  in  a  have  existential  force,  then  there  is  no  dynamic  inference 
channel  in  a. 

Because  dynamic  inference  channels  involve  system  behavior,  in  many  cases  the  mechanisms  could  also  be  used 
as  covert  signaling  channels  between  cooperating  malicious  high  and  low  subjects.  However,  we  concern  ourselves 
here  only  with  undesired  inferences  through  such  mechanisms,  which  do  not  require  either  a  high  “sender”  or  that 
the  low  “receiver”  be  malicious. 
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For  example,  there  is  no  dynamic  inference  channel  in  the  following  labeling  constraint  (14.4) 
according  to  Theorem  14.3,  which  states  that  there  is  a  level  at  which  no  starships  are  going  to 
Rigel: 

(3/  €  £)(Vz,  y , ^(SMD'Or,  y, z)  -  2  #  Rigel).  (14.4) 

If  there  is  a  level  in  the  low  database  after  a  low  update  at  which  no  starships  are  going  to  Rigel, 
then  there  is  definitely  such  a  level  in  the  entire  database  after  the  low  update. 

Corollary  14.4  If  every  level  quantifier  in  a  with  universal  force  has  the  form  ( Qy  E  L)fi,  where 
L  is  a  lattice  expression  whose  top  level  is  not  T ,  then  there  is  no  dynamic  inference  channel  in  a. 

For  example,  there  is  no  dynamic  inference  channel  in  the  following  labeling  constraint  (14.5) 
according  to  Corollary  14.4,  which  states  that  there  is  a  level  at  which  polyinstantiation  is  prohibited 
for  the  Missionld  attribute  of  MT: 

(3/  €  £)(VZi,fa  €  £i)(Va:,yi,i/2)(MT,1(i,2/i)  A  MTi2(x,2/2)  -  Vi  =  i/2).  (14.5) 

If  there  is  a  level  in  the  low  database  after  a  low  update  at  which  polyinstantiation  is  prohibited 
for  the  Missionld  attribute  of  MT,  then  there  is  definitely  such  a  level  in  the  entire  database  after 
the  low  update. 

Theorem  14.5  If  a  is  single-level,  then  there  is  no  dynamic  inference  channel  in  a. 

For  example,  there  is  no  dynamic  inference  channel  in  constraint  (14.1)  according  to  Theo¬ 
rem  14.5.  If  no  two  low  MT  tuples  have  the  same  Missionld  value  after  a  low  update,  and  no  two 
high  MT  tuples  have  the  same  Missionld  value  before  the  low  update,  then  no  two  MT  tuples  at 
the  same  level  have  the  same  Missionld  value  after  the  low  update. 

It  is  worth  noticing  that  there  is  a  syntactic  symmetry  between  the  classes  of  labeling  constraints 
whose  enforcement  is  free  of  static  inference  channels,  such  as  the  (V)*-class  (Theorem  14.1)  and 
the  (V)*(3)*-class  (Corollary  14.2),  and  the  classes  of  labeling  constraints  whose  enforcement  is 
free  of  dynamic  inference  channels,  such  as  the  (3)*-class  (Theorem  14.3)  and  the  (3)*(V)*-class 
(Corollary  14.4).  Moreover,  the  enforcement  of  the  class  of  single-level  labeling  constraints  is  free 
of  both  static  and  dynamic  inference  channels. 

14.6  Eliminate  Inference  Channels 

We  identify  common  classes  of  labeling  constraints  whose  enforcement,  although  not  free  of  dynamic 
inference  channels,  can  be  made  free  of  static  and  dynamic  inference  channels  if  we  extend  the 
enforcement  by  a  proper  update  policy. 

When  a  labeling  constraint  cannot  be  enforced  because  of  a  dynamic  inference  channel,  the  low 
update  could  sometimes  be  extended  to  a  multilevel  update  that  enforces  the  labeling  constraint. 
Because  of  the  ^-property  of  the  Bell-LaPadula  model,  a  natural  requirement  is  that  the  low  update 
should  only  be  extended  with  updates  at  high  levels.  A  removable  dynamic  inference  channel  is 
one  for  which  such  an  update  policy  exists. 
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Theorem  14.6  Suppose  that  labeling  constraint  a  has  the  form 

(Vxx  eLi,...,xke  Lk)(Vy)(Pi  A  •  ••  A  Pm  ->■ 

(3xi  ex;,. e  4-)(3y')(A' a •  •  •  a p;,)) 

where  Pi,...,Pm  are  relational  formulas,  P( , . . . ,  P'm,  are  atomic  formulas,  and  y  and  y'  are  se¬ 
quences  of  domain  variables ,3  If  x,-  is  dominated  by  the  bottom  level  of  Li+ 1  for  1  <  i  <  k  —  1,  and 
the  top  level  of  every  L\  is  dominated  by  xk  for  1  <  i  <  k' ,  then  every  dynamic  inference  channel 
in  a  is  removable. 

For  example,  there  are  no  dynamic  inference  channels  in  constraint  (14.1)  according  to  The¬ 
orem  14.6.  Suppose  that  we  strengthen  the  requirement  by  replacing  constraint  (14.1)  with  the 
following  labeling  constraint  (14.6),  which  states  that  polyinstantiation  is  prohibited  for  the  Mis- 
sionld  attribute  of  MT: 

(VZi,Z2  €  £)(Vx,y,z)(MT/l(x,t/)  A  MT*2(x,,z)  -  y  =  z)  (14.6) 

then  there  are  dynamic  inference  channels  at  l  for  every  l  ^  T,  because  the  insertion  of  a  low  MT 
tuple  having  the  same  Missionld  as  another  high  MT  tuple  would  be  prohibited.  Moreover,  not 
all  dynamic  inference  channels  are  removable,  since  the  insertion  of  an  MT  tuple  at  mi  having 
the  same  Missionld  as  another  MT  tuple  at  m2  could  not  be  extended  with  the  deletion  of  any 
high  MT  tuples  to  avoid  dynamic  inference  channels.  However,  suppose  that  we  replace  constraint 
(14.1)  with  the  following  labeling  constraint  (14.7)  instead,  which  states  that  polyinstantiation  is 
prohibited  at  comparable  levels  for  the  Missionld  attribute  of  MT: 

(VZi  €  CM  €  ^ii)(Vx,  t/,  ^)(MTil(x,  y)  A  MT*2(x,z)  -  y  =  *)  (14.7) 

then  there  are  still  dynamic  inference  channels  at  /  for  every  /  ^  T,  but  all  of  them  are  removable 
according  to  Theorem  14.6.  The  insertion  of  a  low  MT  tuple  having  the  same  Missionld  as  another 
high  MT  tuple  could  be  extended  with  the  deletion  of  the  high  tuple  to  avoid  dynamic  inference 
channels.  Notice  that  for  a  totally  ordered  lattice,  constraints  (14.6)  and  (14.7)  are  equivalent. 

Not  every  update  policy  is  acceptable.  A  natural  requirement  of  an  update  policy  is  that  the 
extended  multilevel  update  should  not  cause  a  loss  of  high  data.  In  other  words,  a  low  update 
should  only  be  extended  with  high  insertion.  A  lossless  dynamic  inference  channel  is  one  for  which 
such  a  lossless  update  policy  exists.  For  example,  the  above  update  policy  for  constraint  (14.7)  is 
not  lossless. 

Theorem  14.7  If  all  relational  formulas  in  labeling  constraint  a  have  negative  polarity,  then  no 
dynamic  inference  channel  in  a  is  lossless. 

According  to  Theorem  14.7,  no  dynamic  inference  channels  in  constraint  (14.7)  are  lossless. 
Hence  they  cannot  be  removed  by  any  lossless  update  policy. 

3We  assume  that  the  universal  level  quantifiers  are  not  vacuous;  namely,  k  >  1,  and  every  x,  appears  in  some  P, 
for  1  <  i  <  k  and  1  <  j  <  m. 
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Theorem  14.8  Suppose  that  labeling  constraint  a  has  the  form 

(V®  €  X)(Vy)(Pi  A  •  •  •  A  Pm  — ►  (3®!  £Lu...,xk€  Lk)(3y')(P{  A  •  •  •  A  P^-)) 

where  Pu...,Pm  are  relational  formulas,  P{,...,P^  are  atomic  formulas,  and  y  and  y'  are  se¬ 
quences  of  domain  variables.4  If  x  is  the  top  level  of  Li  for  1  <  *  <  k,  then  every  dynamic  inference 
channel  in  a  is  lossless. 

For  example,  there  are  dynamic  inference  channels  in  constraint  (14.2)  at  every  level  /  for  /  ^  T, 
when  a  deleted  low  primary  key  value  is  referred  to  by  an  existing  high  foreign  key  value.  However, 
we  can  define  a  lossless  update  policy  that  removes  all  these  dynamic  inference  channels  according 
to  Theorem  14.8,  because  the  low  deletion  of  the  primary  key  value  could  always  be  extended  with 
a  high  insertion  of  the  same  primary  key  value. 

Not  every  lossless  update  policy  is  acceptable.  A  natural  requirement  of  a  lossless  update  policy 
is  that  the  extended  multilevel  update  should  be  minimized  in  the  amount  of  change  to  high  data. 
In  other  words,  the  number  of  high  insertions  should  be  minimal.  However,  a  minimal  and  lossless 
update  policy  is  in  general  not  unique.  For  example,  a  minimal  update  policy  for  constraint  (14.2) 
could  extend  a  deletion  of  the  first  two  MT  tuples  in  Figure  12.1  with  the  insertion  of  either  tuple 
at  T,  both  of  which  are  minimal.  We  could  often  achieve  uniqueness  by  considering  subclasses  of 
updates.  For  example,  if  we  restrict  ourselves  to  updates  consisting  of  the  insertion  or  deletion  of 
single  tuples,  then  a  unique,  minimal,  and  lossless  update  policy  can  be  defined  that  removes  every 
dynamic  inference  channel  in  constraint  (14.2),  as  is  shown  in  [37]. 

Theorem  14.9  Suppose  that  labeling  constraint  a  has  the  form5 

(V®  €  I)(Vy)(Px  A  •  •  •  A  Pm  —  (3!*i  €  Ii, . .  .,**  €  Lk)(3\y')(P[  A  •  •  •  A  P'm,)) 

where  Pu  . . .,  Pm  are  relational  formulas,  P[,...,P'm,  are  atomic  formulas,  and  y  and  y'  are  se¬ 
quences  of  domain  variables.6  If  x  is  the  top  level  of  Li  for  \<i<k,  then  there  is  a  unique, 
minimal,  and  lossless  update  policy  that  removes  every  dynamic  inference  channel  in  a. 

For  example,  if  we  strengthen  constraint  (14.2)  into  the  following  labeling  constraint  (14.8), 
which  states  that  every  foreign  key  value  refers  to  a  unique  visible  primary  key  value: 

(VJi  G  £)(V®,  y,  ®)(SMDil(®,  y,  z)  — ►  (31^2  €  Ch)(3\w)UTh(y,w))  (14.8) 

then  a  unique,  minimal,  and  lossless  update  policy  can  be  defined  that  removes  every  dynamic 
inference  channel  according  to  Theorem  14.9.  Since  every  MT  tuple  is  referred  to  by  at  most  one 
SMD  tuple,  a  deleted  low  MT  tuple  would  have  to  be  re-inserted  at  high  if  it  is  referred  to  by  any 
high  SMD  tuple. 

When  labeling  constraints  are  restricted  to  polyinstantiation  and  referential  security  properties, 
we  could  do  even  better.  In  fact,  the  update  semantics  of  Section  13.3  actually  defines  an  update 
policy  for  these  constraints,  which  is  unique,  minimal,  and  lossless. 

4 We  assume  that  the  existential  quantification  is  not  vacuous;  namely,  m'  >  1,  and  there  is  an  x,  in  every  P,  for 
1  <  :  <  k  and  1  <  j  <  m1 . 

5 We  use  3!  to  denote  the  quantifier  “there  exists  a  unique...”. 

6  Again,  we  assume  that  the  existential  quantification  is  not  vacuous. 
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14.7  Design  Guidelines 

Our  results  offer  valuable  guidelines  to  database  designers  for  the  appropriate  specification  and 
simple  characterization  of  inference  channel-free  labeling  constraints,  both  positive  and  negative. 
The  following  are  some  sample  guidelines: 

1.  The  class  of  single-level  labeling  constraints,  such  as  the  polyinstantiation  security  property, 
is  a  safe  class  of  labeling  constraints,  since  it  is  free  of  both  static  and  dynamic  inference 
channels. 

2.  If  a  non- single-level  labeling  constraint  is  free  of  static  inference  channels,  then  it  is  most 
likely  not  free  of  dynamic  inference  channels,  and  vice  versa,  due  to  the  symmetry  discussed 
in  earlier. 

3.  The  referential  security  property  is  free  of  static  inference  channels,  and  it  could  not  be  further 
relaxed  without  introducing  static  inference  channels. 

4.  Static  inference  channels  could  be  removed  by  enforcing  labeling  constraints  at  every  level, 
and  dynamic  inference  channels  could  often  be  removed  by  a  proper  update  semantics. 

5.  Replacing  the  polyinstantiation  security  property  with  unconditional  no-polyinstantiation 
constraints  introduces  inference  channels  that  could  not  be  completely  removed,  even  by 
giving  up  high  data. 

6.  The  referential  security  property  contains  lossless  dynamic  inference  channels.  However,  either 
updates  should  be  restricted  or  a  stronger  form  of  the  property  should  be  enforced  in  order 
to  achieve  a  unique,  minimal,  and  lossless  update  semantics. 
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Chapter  15 

Secure  Interoperation 


Recent  advances  in  distributed  systems  and  networking  technology  have  made  interoperation  not 
only  feasible  but  also  increasingly  popular.  For  example,  heterogeneous  databases  can  be  linked 
by  high-speed  networks  that  consist  of  heterogeneous  networks  connected  by  gateways.  In  such 
an  application  environment,  heterogeneity  (such  as  in  data  semantics,  data  representation,  and 
communication  protocol)  among  system  components  must  be  reconciled  properly.  Some  research 
efforts  are  under  way  to  deal  with  these  problems  [51]. 

One  attribute  of  interoperation  that  needs  reconciliation  but  has  not  been  closely  studied  is 
security  with  regard  to  access  control.  Consider  an  application  involving  multiple  systems  dealing 
with  commerce  (e.g.,  national  credit  databases),  finance  (e.g.,  stock  market  information  systems), 
medicine  (e.g.,  patient  records),  and  defense,  each  having  a  distinct  access  control  structure.  To 
facilitate  information  exchange  among  such  systems,  some  mapping  between  the  heterogeneous  se¬ 
curity  attributes  must  be  introduced,  for  example,  by  the  system  administrators.  Current  practices 
show  that  these  mappings,  even  if  chosen  carefully,  can  result  in  security  breaches  that  previously 
did  not  exist  in  any  individual  system  (e.g.,  [58,  33]). 

Secure  interoperation  is  a  serious  concern  for  military  systems1  as  well  as  commercial  ones.  For 
example,  consider  the  information  system  of  a  major  research  organization  where  Alice,  being  a 
project  supervisor,  is  allowed  access  to  Bob’s  files,  but  not  vice  versa.  Suppose  that  this  organization 
has  just  been  purchased  by  a  corporation  where  Charles  is  Vice  President  for  research  and  Diana, 
being  his  secretary,  has  access  to  his  files.  After  the  merger,  it  seems  natural  to  permit  Charles 
to  access  Alice’s  project  papers.  But  if  Bob  should  be  allowed  access  to  Diana’s  file  cabinet,  there 
would  be  a  security  violation  because  now  Bob  would  potentially  have  access  (indirectly  via  Diana 
and  Charles)  to  Alice’s  files  to  which  he  should  be  denied  access. 

Although  the  security  violation  in  this  example  may  not  be  too  difficult  to  remove,  a  real-world 
system  could  have  hundreds  or  thousands  of  entries  in  its  access  control  list  so  that  choosing  a  secure 
yet  satisfactory  (e.g.,  with  maximum  data  sharing)  mapping  between  many  such  access  control  lists 
is  a  daunting  task.  In  other  words,  interoperation  of  systems  with  heterogeneous  access  control 

1  It  is  estimated  in  the  Defense  Information  Systems  Agency’s  Defense  Information  System  Network  Technology 
Requirements  Document,  version  0  (August  3,  1993)  that  the  U.S.  DoD  enterprise  has  more  than  10,000  networks 
worldwide,  most  of  which  are  not  interoperable  with  each  other  and  do  not  adequately  support  information  sharing. 
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structures  poses  the  following  new  challenges:  what  is  the  definition  of  secure  interoperation?  How 
can  security  violations  be  detected?  And  how  can  these  violations  be  removed  while  a  maximum 
amount  of  information  exchange  is  still  facilitated?  We  attempts  to  answer  some  of  these  questions. 
First  we  turn  to  what  we  think  are  the  fundamental  requirements  in  secure  interoperation. 

15.1  Principles  of  Secure  Interoperation 

One  essential  feature  in  federated  systems  is  the  autonomy  of  an  individual  system  each  system 
may  be  administrated  independently  [6,  51].  To  preserve  this  feature  in  secure  interoperation, 
autonomy  in  security  must  be  guaranteed. 

Principle  of  Autonomy.  Any  access  permitted  within  an  individual  system  must  also 
be  permitted  under  secure  interoperation. 

On  the  other  hand,  interoperation  should  not  violate  the  security  of  an  individual  system. 

Principle  of  Security.  Any  access  not  permitted  within  an  individual  system  must 
be  also  denied  under  secure  interoperation. 

All  other  new  access  introduced  by  interoperation  should  be  permitted  unless  explicitly  denied 
by  the  specification  of  secure  interoperation.  Note  that,  unless  specified  otherwise,  by  access  we 
mean  direct  or  indirect  access. 

It  is  conceivable  that  under  some  circumstances  a  system  may  be  willing  to  sacrifice  some  of  its 
autonomy. 

15.2  System  Model  and  Terminology 

In  our  discussion,  the  security  attributes  of  a  system  are  expressed  with  an  access  control  list  (ACL) 
[26].  We  view  a  system  as  a  collection  of  users,  machines,  data  objects,  and  others,  each  being  a 
distinct  unit  with  regard  to  security. 

The  task  we  are  facing  is  the  following:  given  a  set  of  access  control  lists  that  are  individually 
secure,  define  what  secure  interoperation  is,  and  investigate  the  complexity  of  detecting  security  vio¬ 
lations  in  the  global  system  and  that  of  removing  security  violations  while  maintaining  a  reasonable 
level  of  interoperation. 

It  has  been  previously  shown  that  the  security  of  any  given  access  control  list  is  in  general 
undecidable  [19],  and  some  variations  of  the  decision  problem  are  at  best  NP-complete  [45].  There¬ 
fore,  we  also  expect  to  obtain  NP-completeness  results  and  thus  follow  the  general  proof  method 
for  NP-completeness  to  investigate  only  a  restricted  problem  where  in  each  ACL:  (1)  each  subject 
owns  exactly  one  file,  with  read  and  write  access;  (2)  a  subject  can  have  only  read  access  to  a  file 
owned  by  someone  else;  (3)  if  a  subject  can  read  another’s  file,  the  latter  cannot  read  the  former’s 
file;  (4)  an  ACL  is  static  in  that  read  and  write  are  the  only  types  of  access  specified. 

Our  NP-completeness  results  should  imply  similar  NP-completeness  results  for  formations  of 
the  problem  using  more  general  access  control  lists.  In  addition,  given  the  particular  restrictions  on 
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ACL,  our  results  should  also  imply  NP-completeness  results  for  the  interoperation  of  Bell-LaPadula 
(e.g.,  [27])  type  multilevel  secure  systems. 

In  our  discussion,  we  use  the  following  terminology,  notations,  and  definitions.  Because  one 
subject  owns  exactly  one  file,  there  is  no  need  to  distinguish  between  a  subject  and  its  file.  For 
example,  instead  of  saying  that  Alice  has  access  to  Bob’s  file,  we  can  simply  say  that  Alice  has 
access  to  Bob.  We  call  this  combination  of  a  subject  and  its  file  an  entity.  Moreover,  it  is  obvious 
that  one  entity  has  access  to  oneself  (i.e. ,  one’s  own  file),  and  if  Alice  can  access  Bob,  and  Bob  can 
access  Charles,  then  Alice  can  access  Charles  indirectly.  Recall  that  one  restriction  on  the  ACL  is 
that  if  Alice  can  access  Bob  then  Bob  cannot  access  Alice,  we  arrive  at  the  following  definition  of 
a  secure  system  as  specified  with  a  restricted  ACL. 

A  secure  system  is  an  ACL  in  the  form  of  G  =  <  V,  A  >  where  V  is  a  set  of  entities  and  A  is  a 
binary  relation  “access”  on  V  that  is  reflexive,  transitive,  and  antisymmetric. 

Graphically,  we  can  view  a  system  as  an  acyclic  directed  graph.  V  is  the  set  of  vertices  and 
A  is  the  set  of  arcs — there  is  an  arc  leading  from  vertex  u  to  v,  denoted  by  {u,  v),  if  and  only  if 
A  contains  the  binary  relation  “u  access  v”.  The  direction  of  the  arc  is  then  the  direction  of  the 
permitted  “access”. 

For  the  merger  example,  we  have  that  Res  =<  {Alice,  Bob,  Eve},  {{Alice,  Bob),  {Eve,  Alice)}  > 
and  Com  =<  {Charles,  Diana,  Fred},  {{Charles,  Fred),  {Diana,  Charles)}.  The  graphical  rep¬ 
resentation  of  both  systems  is  in  Figure  15.1. 


Figure  15.1:  Two  Separate  Systems 


For  convenience,  we  sometimes  do  not  distinguish  between  an  ACL  and  its  graphical  represen¬ 
tation  if  no  confusion  can  arise. 

We  say  that  an  access  {u,v)  is  legal  in  G  (or  in  A)  iff  there  is  a  directed  path  in  (the  graphical 
representation  of)  G  leading  from  u  to  v.  We  denote  this  with  {u,  v)  oc  G. 

Suppose  we  have  n  secure  systems,  G,  =<  Vi,  A,  >,  i  =  1,2  ,...,n,  and  for  simplicity,  we  assume 
that  all  entities  are  distinctly  named — that  is,  Vi  D  Vj  —  0,  i  ^  j.  To  facilitate  interoperation, 
mappings  between  entities  of  different  systems  must  be  introduced  to  reflect  the  desired  data 
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sharing  through  interoperation.  Such  mappings  can  be  represented  by  a  set  of  cross-system  “access” 
relations  F,  which  is  chosen  possibly  by  an  administrator  with  global  security  responsibility  or  by 
a  select  committee  in  charge  of  the  individual  systems.  Permitted  access  is  a  binary  relation  F  on 
UjLjVf  where  V(u,v)  E  F,  u  E  Vi,  v  E  Vj,  and  i  ^  j.  The  fact  that  (u,v)  E  F  indicates  that  it 
is  thought  that  entity  u  (in  system  G,)  should  be  allowed  to  access  entity  v  (in  system  Gj).  Note 
that  it  is  possible  to  have  both  (u,  v)  E  F  and  (v,u)  E  F. 

In  our  example,  suppose  that  it  is  decided  that  interoperation  should  allow  Bob  to  access  Fred 
(i.e.,  his  file)  and  Charles  to  access  Alice.  Then  the  global  system  is  in  Figure  15.2  where  arcs 
belonging  to  F  are  represented  as  dotted  lines. 


Figure  15.2:  Interoperation  of  Two  Systems 


The  interoperation  may  also  mandate  a  set  of  restricted  access  R,  which  is  a  binary  relation 
R  on  U"=1Vi  such  that  Vu,  v  E  R,  u  E  Vi,  v  E  Vj,  and  i  ^  j.  This  is  similar  to  a  negative  entry 
in  an  access  control  list  [46].  The  purpose  is  to  explicitly  safeguard  certain  parts  of  the  system 
when  the  potential  implications  of  introducing  F  are  unclear.  In  our  example,  we  may  forbid  access 
{Diana,  Eve).  R  takes  precedence  over  F. 

To  define  secure  interoperation  for  a  federated  system  Q  =  <  W,  B  > ,  recall  that  the  autonomy 
principle  requires  that  a  legal  access  in  A,  remain  legal  in  B,  i.e.,  if  (u,v)  ex  A,-  then  (u,  v)  <x  B. 
On  the  other  hand,  the  security  principle  requires  that  an  illegal  access  in  A,  remain  illegal  in  the 
interoperation,  i.e.,  if  (u,v)  ^  A,-  then  (u,v)  5^  B.  In  addition,  all  access  in  R  should  be  explicitly 
restricted — that  is,  B  0  R  =  0  (the  empty  set).  Q  is  a  secure  interoperation  if  B  D  R  =  0,  and 
Vu,  v  E  Vi,  ( u ,  v )  a  A,-  if  and  only  if  ( u ,  v)  oc  B. 

F  and  R  may  contradict  each  other,  and  other  security  violations  can  also  occur  as  a  result  of 
interoperation.  As  illustrated  in  Figure  15.3,  Bob  can  access  Alice  indirectly  through  Diana,  which 
is  illegal  within  the  research  organization. 

In  situations  like  this,  F  may  need  to  be  changed  or  reduced  to  remove  security  violations 
(recall  that  R  takes  precedence  over  F).  Thus,  given  G,, i  =  1  ,...,n,F,  and  R,  our  aim  is  to  find 
a  federated  system  Q  =<  W,  B  >,  where  W  =  U"=1  V,-  and  B  C  (U”=1  A,  U  F)  -  R,  such  that  Q  is  a 
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Figure  15.3:  Security  Violation  Caused  by  Interoperation 

secure  interoperation. 

15.3  Complexity 

For  convenient  discussion,  we  mark  all  arcs  belonging  to  Gi,i  =  green,  mark  all  arcs  in 

the  permitted  access  set  F  purple,  and  mark  all  arcs  in  the  restricted  access  set  R  red. 

The  first  problem  we  encounter  is  to  decide  if  a  given  interoperation  is  secure. 

Theorem  15.1  Security  evaluation  is  in  P. 

If  B  =  (U"=1  A,-  U  F)  -  R  is  insecure,  we  can  remove  the  security  violations  by  reducing  F  until 
the  resulting  interoperation  is  secure.  In  other  words,  find  S  C  F  such  that  C  =  (U "=1A;  U  5)  —  R 
is  secure.  This  is  trivial  because  5  =  0  is  definitely  a  secure  solution. 

To  find  nontrivial  secure  solutions,  one  choice  is  to  find  a  secure  solution  that  includes  all  other 
secure  solutions.  In  other  words,  find  5  C  F  such  that  C  =  U  5)  —  R  is  secure  and,  for 

any  secure  solution  T,  T  C  S.  Unfortunately,  such  solutions  do  not  always  exist,  as  is  shown  by 
the  following  counterexample. 

Consider  the  interoperation  of  G\  =<  {al,a2,a3},{(al,a2),(a2,  a3)}  >  and  G 2  =< 

{61, 62, 63},  {(61, 62),  (62, 63)}  >,  as  illustrated  by  Figure  15.4.  Suppose  F  = 
{(63,a2),(a3,62)},  which  obviously  causes  a  security  violation  because  access  (a3,a2) 
is  legal  in  the  federated  system  but  illegal  in  G\.  One  secure  solution  is  5 1  =  {(a3,62)}. 
Another  secure  solution  is  S2  =  {(63,  a2)}.  But  any  solution  containing  both  S\  and  5 2 
contains  F,  which  causes  a  security  violation. 

An  alternative  in  finding  nontrivial  secure  solutions  is  to  look  for  solutions  that  cannot  be 
expanded  any  further.  In  other  words,  find  a  secure  solution  5  C  F  such  that,  for  any  secure 
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Figure  15.4:  All-Inclusive  Solutions  May  Not  Exist 


solution  T,  S  <£T.  This  problem  is  in  P,  as  the  following  polynomial-time  algorithm  demonstrates: 
start  with  an  empty  solution  5;  add  elements  in  F  to  S  one  by  one,  and  only  if  the  addition  will 
not  cause  a  security  violation  (recall  that  security  evaluation  is  in  P);  repeat  this  process  until  no 
more  elements  can  be  added.  The  correctness  of  this  algorithm  is  obvious. 

The  three  choices  described  so  far  do  not  give  natural  optimality  measures.  For  example,  a 
solution  may  turn  out  to  contain  just  one  arc  from  F  although  the  exclusion  of  this  single  arc 
would  allow  the  addition  of  two  other  arcs,  with  the  latter  intuitively  facilitating  more  information 
exchange. 

Therefore,  we  propose  two  definitions  that  are  more  natural.  From  now  on,  we  stipulate  that 
F  ^  0  because  the  secure  interoperation  problem  disappears  when  F  =  0  (and  thus  R  =  0). 

One  natural  optimality  measure  is  to  maximize  direct  information  sharing.  Take  the  interoper¬ 
ation  represented  in  Figure  15.3,  for  example.  Arcs  a  and  d  (or  c  and  d )  cause  a  security  violation. 
To  reduce  a  minimum  number  of  arcs  from  F,  it  is  better  to  remove  d  so  that  both  a  and  c  can  be 
preserved. 

Theorem  15.2  Maximum  secure  interoperation  is  NP-complete. 

So  far  we  have  been  working  to  find  maximum  subsets  of  F  that  result  in  secure  interoperation, 
and  Theorem  15.2  suggests  that  this  is  hard. 

Another  natural  measure  of  optimality  is  to  maximize  direct  and  indirect  information  sharing  by 
working  on  the  whole  federated  system.  The  aim  is  to  find  a  secure  interoperation  with  a  maximum 
number  of  legal  access,  instead  of  looking  for  a  secure  solution  F  of  a  maximum  size.  That  is,  we 
can  now  change  F  as  long  as  the  new  F  does  not  introduce  an  access  that  is  illegal  under  the  initial 
set  F. 

Take  the  interoperation  represented  in  Figure  15.3  again,  for  example.  Arcs  a  and  d  (or  c  and 
d)  cause  a  security  violation.  Previously,  for  a  solution  with  maximum  size,  it  was  better  to  remove 
d  so  that  both  a  and  c  could  be  preserved.  Now  to  obtain  maximum  access,  it  is  actually  better  to 
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remove  both  a  and  c  to  preserve  d  because  the  latter  facilitates  more  (albeit  indirect)  information 
sharing. 

Theorem  15.3  Maximum-access  secure  interoperation  is  NP-complete. 

The  above  results  show  that  the  problems  we  are  investigating  are  NP-complete  in  general. 
Nevertheless,  we  have  found  a  simplified  case  where  each  every  G,  is  a  total  order. 

Theorem  15.4  Simplified  maximum-access  secure  interoperation  is  in  P. 

The  above  theorem  is  very  encouraging  and  more  polynomial-time  solvable  subcases  would  be 
desirable. 

15.4  Composability 

To  reduce  the  total  complexity  of  finding  maximum  secure  interoperation,  one  area  for  exploration 
is  the  topology  of  system  interoperation.  In  some  federated  systems,  for  example,  interoperation 
is  accomplished  by  having  a  master  system  interacting  with  other  systems  in  local  interoperation 
[51].  We  now  prove  that  in  such  a  configuration,  the  global  interoperation  is  secure  if  and  only  if 
each  local  interoperation  is  secure. 

Given  systems  Gi  =<  Vi,A{  >,i  =  0,1,..., n,  where  Go  is  the  master  system,  let  Go,i  —< 
Go,G{,Fi  >  denote  the  local  interoperation  between  Go  and  Gi  with  permitted  access  set  F% ,  i  — 
1, . . . ,  n.  The  global  system  is  thus  G'  =<  U”_0Vj,  (U"_0.Ai)  u  (UjLjJi). 

Theorem  15.5  G'  is  secure  if  and  only  if  Go,i  is  secure,  i=  1  ,...,n. 

This  theorem  implies  that  local  secure  interoperation,  and  thus  local  maximization,  can  be 
computed  independently  and  in  parallel. 

Corollary  15.6  G'  is  a  maximum  secure  interoperation  if  and  only  if  Go,,'  is  a  maximum  secure 
interoperation,  i  =  1, . . . ,  n. 

The  two  very  positive  results  above  indicate  that  in  a  star-like  configuration,  global  (maximum) 
secure  interoperation  can  be  achieved  in  a  distributed  fashion,  locally,  and  incrementally  as  more 
systems  join  the  interoperation.  We  can  thus  say  that  (maximum)  secure  interoperation  is  compos- 
able.  Note  that  these  results  do  not  necessarily  imply  that  maximum- access  secure  interoperation 
is  composable. 

The  proofs  in  Theorem  15.5  clearly  extend  to  any  configuration  of  a  tree  structure  in  that 
if  all  local  interoperation  between  neighboring  systems  are  secure  or  maximum,  then  the  global 
interoperation  is  also  secure  or  maximum. 

Corollary  15.7  Secure  interoperation  and  maximum  secure  interoperation  are  composable  in  any 
tree- structure  configuration. 
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In  a  ring-structure  configuration  (or  any  configuration  containing  a  ring),  the  composability 
theorem  does  not  always  hold.  A  simple  counterexample  is  when  each  Fi  contains  only  one  arc; 
thus,  each  local  interoperation  is  secure,  but  the  collection  of  these  plus  a  green  arc  forms  a  cycle 
and  permits  an  illegal  access.  The  implication  is  that  secure  interoperation  can  be  joined  together 
as  long  as  no  ring  is  formed. 

From  the  proof  details,  we  expect  that  the  above  composability  results  generalize  beyond  the 
simple  access  control  structure  we  have  assumed  in  our  current  discussion. 
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Chapter  16 

Conclusion 


We  have  developed  a  formal  policy  framework  of  MAC  policies  in  multilevel  relational  databases. 
We  have  identified  seven  important  components  of  such  policies,  and  have  characterized  their 
desirable  properties. 

Besides  the  four  components  in  the  traditional  interpretation  of  MAC  policies  in  multilevel 
databases,  one  of  the  most  important  new  components  is  the  interpretation  policy.  By  mapping 
multilevel  relational  databases  to  multilevel  theories  and  structures,  the  superficial  syntactic  dif¬ 
ference  in  object  labels  is  abstracted  away,  and  the  semantic  difference  hidden  in  object  labels  is 
made  precise.  As  a  consequence,  the  interpretation  policy  makes  it  possible  to  compare  the  seman¬ 
tics  of  multiple  MAC  policies.  As  examples,  we  have  developed  natural  interpretation  policies  for 
multilevel  relational  databases  with  tuple-level  and  element-level  labeling  respectively,  which  have 
properties  that  are  commonly  recognized  as  desirable.  Based  on  these  policies,  we  have  provided 
practical  design  trade-offs  in  choosing  between  tuple-level  and  element-level  labeling. 

The  second  new  component,  the  view  policy,  specifies  the  upward  information  flow  requirements 
for  a  set  of  view  constraints.  A  view  policy  should  have  three  desirable  properties: 

1.  it  ensures  the  validity  of  view  constraints, 

2.  it  maximizes  upward  information  flow,  and 

3.  it  is  deterministic. 

As  an  example,  we  have  developed  a  view  policy  for  multilevel  relational  databases  with  tuple-level 
labeling,  where  the  view  constraints  consist  of  key-based  functional  and  referential  dependencies, 
which  has  all  the  desirable  properties  identified  above. 

The  third  new  component,  the  update  policy,  specifies  the  mechanisms  to  eliminate  inference 
channels  in  the  enforcement  of  a  set  of  labeling  constraints.  An  update  policy  should  also  have 
three  desirable  properties: 

1.  it  does  not  introduce  inference  channels, 

2.  it  does  not  affect  data  at  lower  or  incomparable  levels,  and 
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3.  it  does  not  cause  data  loss  at  higher  levels. 

Based  on  these  properties,  we  have  provided  practical  design  guidelines  for  the  appropriate  speci¬ 
fication  of  labeling  constraints,  whose  enforcement  would  not  jeopardize  secrecy  requirements.  As 
an  example,  we  have  developed  an  update  policy  for  multilevel  relational  databases  with  tuple- 
level  labeling  where  the  labeling  constraints  consist  of  polyinstantiation  and  referential  security 
properties,  which  has  all  the  desirable  properties  identified  above. 

Based  on  the  framework,  we  have  compared  the  MAC  policies  commonly  imposed  in  or  proposed 
for  multilevel  relational  databases.  Our  framework  could  be  used  to  capture  and  resolve  the  MAC 
policy  mismatches  in  the  secure  interoperation  of  heterogeneous  multilevel  databases.  As  an  initial 
step  in  this  direction,  we  have  investigated  the  secure  interoperation  of  multilevel  databases  whose 
MAC  policies  mismatch  in  the  lattice  component. 
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Part  III 

Appendix:  Prototype  System  Design 
and  Implementation 
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Chapter  17 

Introduction 


We  have  developed  a  query  mediation  concept  demonstration  prototype.  This  appendix  focuses  on 
the  prototype  itself.  First  a  short  general  description  of  the  prototype’s  functionality  is  provided 
in  Chapter  18.  Next,  some  information  about  the  implementation  is  given  in  Chapter  19.  Finally, 
the  demonstration  is  described  in  detail.  A  high-level  description  of  each  of  the  four  examples  that 
constitiute  the  demonstration  can  be  found  in  Chapter  20,  followed  by  an  annotated  transcript  of 
the  demonstration  that  fills  in  the  low-level  details  in  Chapter  22. 
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Chapter  18 

Prototype  Functionality 


The  mediator  prototype  performs  three  principal  functions.  First,  it  translates  queries  expressed  in 
database  query  languages  into  the  mediator’s  logic-based  internal  representation,  and,  conversely, 
translates  queries  expressed  in  the  internal  representation  into  database  query  languages.  Second, 
it  transforms  a  query  on  one  database  into  a  collection  of  queries  on  other  databases  that,  when 
evaluated,  will  provide  data  relevant  to  the  original  query.  Third,  it  translates  the  tables  produced 
by  evaluating  the  collection  of  queries  into  a  table  of  responses  to  the  original  query. 

18.1  Query  Translation 

The  present  prototype  translates  two  dialects  of  SQL,  Standard  SQL  and  the  version  of  SQL  used 
by  Oracle.1  into  the  internal  representation  language.  (Only  queries  of  the  form 

SELECT  select-clause-list* 

FROM  relation-list* 

WHERE  conjoined-equations-and-inequalities* ; 

are  translated  by  the  prototype,  because  the  query  transformation  function  implemented  in  the 
prototype  can  only  handle  queries  of  this  form.)  The  query  is  first  parsed,  producing  an  abstract 
syntax  tree,  and  then  additional  syntactic  checking  is  performed,  using  the  mediator’s  internal 
representation  of  the  originating  database’s  schema. 

Prior  to  the  translation  into  logic,  the  query  is  minimized.  Minimization  is  a  way  of  eliminating 
inessential  dependencies  built  into  the  schema.  For  example,  suppose  that  the  originating  database 
contains  a  relation  R  with  attributes  A0,  Ai,  and  A2  where  A0  is  a  primary  key.  The  relation  R 
could  be  broken  down  into  three  relations 

R0  =  {{Ao  !->■  z0)  :  for  some  X\  and  x2,  { Ao  >-*  xo,Ai  i-»  Xi,  A2  >-*■  x2}  €  R} 

Ri  =  {{A0  »->•  z0,  At  t-»  zi}  :  for  some  x2,  {A0  z0,  Ay  i-+  xi,A2  i-s-  x2 }  €  R} 

R2  =  {{A0  xq,A2  i— >  x2}  :  for  some  {A0  z0,  A\  zi,  A2  x2 }  G  R} 

1A11  product  and  company  names  mentioned  in  this  report  are  trademarks  of  their  respective  holders. 
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without  loss  of  information.  If  a  query  Q  that  involves  R  can  be  rewritten  as  a  query  involving  only 
one  of  Ri  and  jR2,  the  likelihood  of  finding  information  relevant  to  Q  in  another  database  is  greatly 
increased.  For  example,  a  target  database  might  well  contain  a  relation  R\  with  tuples  semantically 
relevant  to  R\  but  no  information  semantically  relevant  to  -R2.  (An  additional  advantage  is  that 
decomposing  queries  into  multiple  queries  over  different  databases  becomes  easier.)  Minimization 
consists  of  rewriting  the  query  to  use  such  minimal — i.e.,  minimal  relative  to  the  semantic  con¬ 
straints  imposed  by  the  keys — relations.  For  the  relation  R  above,  this  would  be  done  by  rewriting 
references  to  attributes 

R.Ai  — *  Ri.Ai  (i  e  {0,1,2}) 


and  adding  the  constraint 

Rq.Aq=R{.Aq 

to  the  WHERE  clause  if  Ri  is  mentioned  (i  G  {1,2}),  so  that  joins  are  performed  when  necessary. 

The  process  of  translating  a  minimized  query  into  the  logical  language  is  conceptually  straight¬ 
forward:  the  FROM  and  WHERE  clauses  of  the  query  supply  the  matrix  of  the  formula,  and  then  the 
variables  in  the  matrix  are  existentially  quantified  unless  the  corresponding  attributes  are  mentioned 
in  the  SELECT  clause.  For  example,  the  query 


SELECT  R.A 
FROM  R,  S 
WHERE  R.B=S.C; 


where  the  scheme  of  R  is  {A,  B}  and  the  scheme  of  S  is  {C,  D},  would  be  translated  to 


3xB  3xc  3zd  [R(a:A,a:B)  A  S(xc,xD)  A  xB  =  zc] 


18.2  Query  Transformation 

After  translation,  logical  simplification  is  performed.  Minimization  often  introduces  considerable 
redundancy,  which  can  profitably  be  eliminated  prior  to  attempts  to  transform  the  query.  In  terms 
of  the  earlier  example,  where  relation  R  is  replaced  by  relations  Ro,  Ri  and  i?2,  the  mediator  can 
use  the  fact  that  R0  is  implied  by  Rt  (i  €  {1,2})  to  simplify  a  formula  such  as 

_R0(z)  A  Ri(y,  z)Ax  =  yAx<k 


to 


Ri(y,z)Ay  <  k 


Simplification  rules  for  eliminating  redundancy  due  to  minimization  are  generated  automatically. 
If  a  database  scheme  contains  additional  semantic  redundancy,  which  is  not  uncommon  in  practice, 
additional  rules  can  be  added  to  the  simplifier.  Generally,  if,  for  some  formulas  (f>  and  ip ,  the 
simplifier  is  told  that  <p  implies  ip,  then  it  will  simplify 


<p  A  ip[x/x’,  y/y', . . . ,  z/z'}  Ax  =  x'  Ay  =  y'  A  ■  •  ■  A  z  =  z'  A  /\T 
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to 

4>  A  f\T[x'/x,  y'  /  y.,...-,  z'  /  z] 

After  simplification,  rewrite  rules  are  applied  in  an  attempt  to  replace  the  vocabulary  of  the 
originating  database  with  the  vocabulary  of  some  target  database.  A  rewrite  rule  either  replaces 
a  term  by  a  term  or  replaces  a  conjunction  of  literals — which  can  be  thought  of  as  a  complex 
predicate — by  a  conjunction  of  literals.2  Examples  of  typical  transformation  rules  can  be  found  in 
the  discussion  of  the  examples  in  Chapter  20.  The  most  complex  and  interesting  rule  appears  in 
the  fourth  example.  It  has  the  form 


x  =  k  — >•  x  =  a(k ) 

where  a  is  code  that  runs  a  query  involving  the  constant  term  k  on  some  database  and  extracts  a 
constant  term,  represented  here  by  a(k),  from  the  response.  Effectively,  the  rule  says  to  replace  k 
by  some  other  term  a(k )  determined  by  consulting  some  database. 

After  all  the  rules  for  transforming  queries  on  the  originating  database  to  the  target  database 
have  been  applied,  either  all  the  vocabulary  of  the  originating  database  has  been  replaced  by  the 
vocabulary  of  the  target  database,  or  some  remains.  If  all  the  originating  vocabulary  has  been 
replaced,  then  we  have  a  query  on  the  target  database,  which  is  then  translated  to  (the  appropriate 
version  of)  SQL,  ”de- minimized”,  and  executed  to  obtain  additional  data  relevant  to  the  original 
query.  If  some  of  the  originating  vocabulary  remains,  we  have  a  query  that  cannot  be  executed 
on  the  target  database.  In  this  case,  we  have  two  alternatives.  First,  we  can  simply  abandon  the 
attempt  to  retrieve  relevant  data  from  the  target.  In  the  prototype,  this  alternative  is  chosen  when 
the  attempt  to  rewrite  the  query  has  no  effect  on  it,  that  is,  when  none  of  the  rules  were  applicable. 
Second,  we  can  attempt  to  build  on  partial  success  by  applying  the  rules  for  transforming  queries  on 
the  originating  database  to  some  other  target  database  to  the  partially  transformed  query.  If  these 
rules  eliminate  the  remnants  of  the  original  vocabulary,  the  result  is  a  query  over  the  combination 
of  the  two  databases. 

In  theory,  this  process  could  be  repeated,  resulting  in  a  query  over  some  combination  of  a  large 
number  of  target  databases.  In  practice,  the  combinatorics  prohibit  complex  combination;  even 
exhaustive  exploration  of  all  pairs  of  targets  is  too  expensive,  given  the  small  probability  that  any 
given  pair  will  yield  useful  information.  Therefore,  the  mediator  contains  knowledge  of  which  other 
targets  should  be  considered  in  the  case  of  partial  success  as  part  of  its  control  strategy. 

The  query  over  the  combined  databases  must  then  be  broken  down  into  queries  over  the  several 
individual  databases,  plus  “glue”  for  combining  the  results  of  executing  those  queries.  Sorting  the 
query’s  conjuncts  is  straightforward.  A  literal  involving  a  relation  on  one  of  the  databases  becomes 
part  of  the  query  on  that  database.  If  an  equation  or  inequality  is  between  two  terms  associated 
with  a  database,  it  becomes  part  of  the  query  on  that  database.  Otherwise,  it  relates  terms  across 
databases  and  is  reserved  for  “gluing”  the  results  together.  See  the  third  example  in  Chapter  20  for 
a  case  where  an  equation  that  corresponded  to  an  equijoin  in  the  first  example  is  used  as  “glue”. 

2  Note  that  the  formula  produced  by  translating  the  query  is  an  existentially  quantified  conjunction  of  literals. 
Even  when  more  complex  WHERE  clauses  are  eventually  supported,  the  matrix  of  the  formula  will  still  be  in  disjunctive 
normal  form,  and  so  the  present  rewrite  rules  can  still  be  applied  disjunct  by  disjunct. 


84 


The  correctness  criterion  for  rules  is  that  the  formula  on  the  right-hand-side  should  imply, 
given  the  semantics  of  the  relations  and  terms,  the  formula  on  the  left-hand-side.  This  means  that 
any  sequence  that  satisfies  the  transformed  query  will  satisfy  the  original,  relative  to  the  intended 
semantics.  Therefore,  the  tuples  returned  by  executing  the  transformed  query  are  relevant  to  the 
original  query,  in  the  sense  that  the  user  has  asked  for  all  data  satisfying  a  certain  semantic  property, 
and  these  data  have  that  property. 

18.3  Table  Translation 

If  query  transformation  produces  a  single  query,  table  translation  is  simply  a  matter  of  applying 
the  term  rewriting  rules  “backward”,  to  replace  the  terms  of  the  target  database  by  the  terms  of 
the  originating  database,  and  reordering  the  columns  (if  necessary).  The  resulting  table  looks  like 
a  response  to  the  original  query,  which  can  be  presented  to  the  user  as  additional  data.  If  the 
transformation  produces  multiple  queries  and  “glue”,  the  resulting  tables  must  first  be  combined 
using  that  glue.  The  third  example  in  Chapter  20  illustrates  using  an  equation  to  combine  tables, 
that  is,  to  perform  an  equijoin. 
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Chapter  19 

Prototype  Software 


The  mediator  runs  as  a  Lisp  process  under  Unix,  communicating  with  database  managements 
systems  (DBMSs)  and  DBMS  user  interfaces  via  ASCII  files.  For  example,  in  the  demonstration, 
a  wrapper  around  a  database  system  writes  the  user’s  query  to  a  file.  The  mediator  uses  the  file 
name  to  determine  which  wrapper  sent  the  query,  i.e.,  which  database  the  query  is  on.  It  reads  the 
query,  and  writes  any  resulting  queries  to  files  that  are  monitored  by  the  various  DBMS  wrappers. 
The  same  process  is  used  to  pass  tables  between  wrappers  and  the  mediator.  Periodic  polling  of 
ASCII  files  is  used  rather  than  some  more  direct  form  of  Unix  interprocess  communication  in  order 
to  achieve  greater  operating  system  independence.  Any  system  that  can  exchange  files  with  the 
system  running  the  mediator  can  host  a  mediated  database. 

The  mediator  code  resides  in  a  single  Lisp  package,  called  MERRIMACK.  The  source  code  is 
organized  as  a  collection  of  files  corresponding  to  data  structures  used  by  the  mediator  (e.g., 
formulas .  lisp,  which  defines  the  data  structures  that  implement  logical  formulas)  and  functions 
performed  by  the  mediator  (e.g.,  parse-sql.lisp,  which  contains  the  code  for  parsing  queries), 
together  with  a  system  definition  file  merrimack .  lisp  which  defines  the  MERRIMACK  package  and 
loads  the  other  files.  Code  specific  to  the  demonstration — database  schema  definitions,  the  partic¬ 
ular  transformation  rules  used,  etc. — is  in  the  file  demo. lisp.  An  image  containing  the  code  for 
the  demonstration  is  just  under  7.8  Mbytes. 
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Chapter  20 

Examples 


The  demonstration  consists  of  a  series  of  four  examples.  The  four  can  be  run  any  number  of 
times,  in  any  order,  but  are  naturally  ordered  by  the  complexity  of  the  functions  performed  by  the 
mediator.  Each  example  is  briefly  described  below;  the  details  can  be  found  in  Chapter  22,  which 
is  an  annotated  transcript  of  the  output  produced  by  the  mediator  during  a  demonstration. 


20.1  First  Example:  Basic  Query  Transformation 

Imagine  that  a  physician  working  at  a  clinic  has  recently  observed  an  unexpected  allergic  reaction 
of  a  patient  to  an  experimental  drug,  XD2001.  As  a  result,  she  decides  to  search  her  local  database 
for  information  about  other  recent  allergic  reactions  to  that  drug.  The  schema  for  the  relevant  part 
of  her  database1  is 


PATIENTS  PATIENT  JD  I  TRANSACTION-TIME 


PATIENT-ALLERGY  |[  PATIENT-ID  DRUG-NAME  NOTE-ID  START-TIME 


NOTES  II  NOTE-ID  I  TEXT 


Thus,  the  appropriate  SQL  query  on  this  database  is 

SELECT  PATIENT.ALLERGY.PATIENTJD,  TEXT 

FROM  PATIENT-ALLERGY,  PATIENTS,  NOTES 

WHERE  PATIENTA.LLERGY.PATIENTJD  =  PATIENTS.  PATIENT  JD 

AND  PATIENT-ALLERGY.NOTEJD  =  NOTES. NOTE  JD 

AND  DRUG-NAME  =  ’XD2001’ 

AND  ’l-JAN-94’  j  TRANSACTION-TIME; 

Unknown  to  our  physician,  data  relevant  to  her  query  are  stored  in  the  database  of  a  local 
hospital  that  is  joined  to  the  clinic  via  the  mediator.  Of  course,  there  are  differences  in  the  way 
that  the  information  is  represented,  which  means  that  the  same  query  cannot  be  used  to  retrieve 
it.  The  hospital’s  schema  is 

1  These  relations  schemes  were  extracted  from  the  THelper-II  database  of  Stanford  Medical  School. 
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ADMISSIONS  PATIENT_ID  ADMISSION-TIME  PATIENT-NAME 


DRUG_ALLERGY  PATIENT  JED  DRUG JD  TEXT 


Three  differences  between  the  two  databases  are  important  to  this  example. 


1.  The  hospital  uses  a  different  name  for  the  drug,  the  tradename  “Druggo”  rather  than  the 
scientific  designation  “XD2001”. 

2.  Notes  about  allergic  reactions  are  stored  in  the  relation  that  records  whether  a  reaction  has 
occurred  in  the  hospital  database,  while  all  notes  are  recorded  in  a  single  relation  in  the  clinic 
database  (presumably  to  save  space). 

3.  Most  importantly,  the  hospital  database  stores  less  precise  information  about  the  time  of  the 
treatment  that  caused  the  allergic  reaction  than  the  clinic  database  does.  While  the  clinic 
database  stores  the  transaction  time  associated  with  the  treatment,  the  hospital  database 
stores  only  the  admission  time  and  release  time  for  the  patient. 


So,  to  transform  this  query,  the  mediator  needs  rules  that 

1.  transform  XD2001  to  DRUGGO, 

2.  transform  a  join  of  PATIENT -ALLERGY  and  NOTES  to  DRUG_ALLERGY,  and 

3.  transform  a  request  for  information  with  a  TRANSACTION-TIME  later  than  some  given  time 
to  a  request  for  information  with  an  ADMISSION-TIME  later  than  that  given  time  (since  any 
transaction  at  the  hospital  must  occur  after  admission). 

This  example  demonstrates  the  query  rewriting  process  in  which  the  mediator  resolves  the 
semantic  and  representational  mismatches  between  the  clinic  database  and  the  hospital  database, 
using  its  knowledge  about  the  relationships  between  the  two  databases.  As  the  result,  our  physician 
at  the  clinic  is  able  to  access  relevant  data  in  both  databases  without  even  knowing  the  existence, 
schema,  and  semantics  of  the  hospital  database. 


20.2  Second  Example:  Reversed  Roles 


The  second  example  is  similar  to  the  first  in  terms  of  requirements  on  the  mediator’s  functionality, 
but  it  makes  the  point  that  the  connection  through  the  mediator  is  symmetric.  This  time,  a 
physician  at  the  hospital  wants  to  determine  whether  a  patient,  whose  ID  is  123-45-6789,  has  had 
a  platelet  count  performed.  So  he  issues  the  query 

SELECT  RUN-DATE 

FROM  TEST-ORDERED 

WHERE  PATIENT  JD  =  T23-45-6789’ 

AND  ORDERABLE.TEST-NAME  =  ’PLATELET  COUNT’; 

on  the  hospital  database,  where  the  relevant  part  of  the  database’s  schema  is 
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It  turns  out  that  the  patient  has  had  a  platelet  count  performed  at  the  clinic,  where  the  relevant 
part  of  the  schema  is 


LABJR.ESULT  I]  PATIENT  JD  I  TEST -NUMBER 


TESTS  II  TEST-NUMBER.  I  TEST-NAME 


Again,  the  organization  of  the  databases  is  somewhat  different — e.g.,  a  join  of  clinic  relations 
LAB -RESULT  and  TESTS  must  be  performed  to  determine  whether  a  test  with  a  given  name  has 
been  run  on  a  given  patient,  while  the  hospital  stores  the  information  in  the  single  relation 
TESTJ3RDERED — so  the  query  must  be  appropriately  transformed. 

This  example  demonstrates  that  the  mediator  is  able  to  perform  query  rewriting  in  both  direc¬ 
tions.  Either  the  clinic  database  or  the  hospital  database  can  serve  as  an  entry  point  for  users  to 
issue  queries,  and  users  can  access  data  in  both  databases  by  only  knowing  the  existence,  schema, 
and  semantics  of  one  database.  In  contrast,  in  the  federated  database  approach,  only  the  federated 
schema  can  serve  as  the  entry  point,  and  users  have  to  understand  the  federated  schema  and  its 
semantics  in  order  to  gain  access  to  multiple  databases.  As  the  result,  our  physician  at  the  hospital 
can  access  data  in  the  clinic  database  just  as  easily  as  physicians  at  the  clinic  accessing  data  in  the 
hospital  database. 


20.3  Third  Example:  Split  Query,  Join  Tables 

This  example  is,  externally,  quite  similar  to  the  first.  Exactly  the  same  query  is  processed  on  exactly 
the  same  clinic  database.  The  only  difference  on  the  hospital  side  is  that  the  relations  ADMISSIONS 
and  DRUG_ALLERGY  are  now  stored  in  different  databases,  so  that  the  information  we  need  cannot 
be  retrieved  using  a  single  query.  Instead,  two  queries  must  be  generated,  and  the  tables  that  result 
from  running  them  combined  by  the  mediator  prior  to  presentation  of  the  information  to  the  issuer 
of  the  original  query.  The  processing  steps  are  illustrated  graphically  in  Figure  20.1. 

This  example  demonstrates  the  ability  of  the  mediator  to  rewrite  a  query  in  one  database  to 
a  query  involving  multiple  databases.  Data  in  multiple  databases  are  combined  to  give  answer  to 
the  original  query.  As  the  result,  our  physician  at  the  clinic  gains  access  to  relevant  hospital  data 
without  knowing  which  hospital  databases  contain  relevant  data  and  how  to  combine  data  from 
them  in  a  semantically  meaningful  way. 


20.4  Fourth  Example:  Auxiliary  Queries 

In  the  final  example,  we  make  a  change  to  the  clinic  database:  rather  than  using  the  patient’s 
social  security  number  (SSN)  as  an  ID,  a  specially  generated  identifier  is  used.  The  correspondence 
between  these  identifiers  and  personal  data  about  the  patient,  including  the  patient’s  SSN,  is  stored 
in  another  database  that  is  accessible  only  to  authorized  users.  The  schema  for  this  restricted-access 
database  is 
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Query  on  Clinic  Database 


transformed  by  Mediator 


Queries  on  Hospital  Databases 


run  by  Hospital  DBMSs 


Replies  to  Queries  on  Hospital  Databases 


transformed  by  Mediator 


Additional  Data  Relevant  to  Original  Query 


Figure  20.1:  Example  Three  Processing 
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PERSONAL -DATA 


PATIENT  JD 


PATIENT-NAME 


SSN  ••• 

Our  physician  is  now  interested  in  allergic  reactions  experienced  by  one  of  her  patients,  who  is 
identified  as  Plllll  in  the  clinic  database  where  allergy  information  is  stored.  She  therefore  issues 
the  query 

SELECT  DRUG-NAME,  TEXT 

FROM  PATIENT-ALLERGY,  NOTES 

WHERE  PATIENT.ALLERGY.PATIENTJD  =  ’Plllll’ 

AND  PATIENT-ALLERGY.NOTEJD  =  NOTES. NOTE  JD; 

Just  as  in  the  first  example,  there  are  relevant  data  in  the  hospital  database,  but  an  additional 
difference  between  the  way  the  two  databases  store  information  is  reflected  in  the  patient  IDs. 
In  order  to  generate  an  equivalent  query  on  the  hospital  database,  the  mediator  must  determine 
Plllll’s  SSN  by  running  the  query 

SELECT  SSN 

FROM  PERSONAL-DATA 

WHERE  PATIENT  JD  =  ’Plllll’; 

on  the  restricted-access  database — provided,  of  course,  that  our  physician  is  allowed  access  to  that 
information — and  extracting  the  SSN  from  the  response.  These  processing  steps  are  illustrated 
graphically  in  Figure  20.2. 

This  example  demonstrates  the  capability  of  the  mediator  to  perform  mediation-based  access 
control.  The  clinic  database  is  unclassified,  the  restricted-access  database  is  secret,  while  the  hos¬ 
pital  database  is  multilevel  where  SSN  is  secret  and  other  data  elements  are  unclassified.  Without 
the  mediator,  our  physician  at  the  clinic  cannot  access  relevant  data  in  the  hospital  database,  even 
though  they  are  unclassified.  Through  the  mediator,  such  access  is  made  possible  without  compro¬ 
mising  the  secret  association  between  SSN  and  ID  stored  in  the  restricted-access  database  or  the 
secret  SSN  stored  in  the  hospital  database. 
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Query  on  Clinic  Database 


Query  on  Restricted- Access  Clinic  Database 
{generated  during  transformation ) 


Reply  to  Query  on  Restricted- Access  Clinic  Database 
{used  to  generate  query  on  Hospital  Database ) 


Query  on  Hospital  Database 


Reply  to  Query  on  Hospital  Database 


Additional  Data  Relevant  to  Original  Query 
Figure  20.2:  Example  Four  Processing 
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Chapter  21 

Lessons  Learned  in  Prototype 
Development 


With  regard  to  functionality,  the  principal  lesson  learned  was  that  this  basic  approach  to  query 
mediation  appears  feasible.  Although  only  a  subset  of  SQL’s  SELECT  queries  are  handled  by  the 
prototype’s  translation  and  transformation  functions,  both  the  query  translator  and  the  query 
transformer  can  be  straightforwardly  extended  to  deal  with  more  complex  queries. 

Many  rule-based  systems  do  not  scale  well  due  to  exponential  growth  in  the  amount  of  knowledge 
required  to  perform  effectively  in  less  restricted  domains.  There  is  good  reason  to  believe  that  the 
mediator  prototype  will  scale  relatively  well.  The  query  transformation  rules  encode  knowledge 
about  semantic  relationships  between  database  schemas  in  a  query  language-independent  form.  As 
a  result,  handling  larger  subsets  of  SQL  will  not  require  any  additional  transformation  rules,  because 
queries  will  still  be  translated  into  the  same  simple  logical  representation  prior  to  transformation. 
More  importantly,  there  is  no  sense  in  which  the  set  of  transformations  must  be  “complete”  in  order 
for  the  prototype  to  perform  effectively.  Each  transformation  represents  an  observed  semantic 
relationship  between  databases.  If  some  semantic  relationships  are  not  represented,  then  some 
relevant  data  will  not  be  returned  in  response  to  some  queries.  But  whatever  data  are  returned  will 
be  relevant;  missing  transformations  cannot  lead  to  incorrect  results.  Even  a  small  effort  devoted 
to  encoding  semantic  relationships  can  yield  significant  returns,  and  the  resulting  rule  set  can  be 
incrementally  extended  if  the  additional  benefits  seem  to  warrant  the  cost. 

Lisp  proved  to  be  a  good  choice  for  the  initial  prototyping  effort.  Its  advanced  facilities  for 
symbolic  programming  simplified  writing  many  of  the  functions  required  for  mediation,  such  as 
the  pattern  matcher  used  in  query  transformation.  An  additional  advantage  of  using  Lisp  is  that 
its  dynamic  nature  makes  generation  and  execution  of  code  at  query-transformation  time  easy. 
Performance  of  the  prototype  was  not  an  issue:  query  translation  and  transformation  had  to  be  ar¬ 
tificially  slowed  down  for  purposes  of  the  demonstration,  and  the  mediator’s  space  requirements  are 
quite  modest  by  modern  standards.  Use  of  file-based  communication  between  the  mediator  and  the 
DBMSs  was  a  reasonable  first  approximation  to  the  sort  of  asynchronous  file-based  communication 
over  the  Internet  used  by  applications  such  as  Mosaic  and  Netscape. 

Finally,  the  principal  lesson  learned  from  the  examples  was  that  writing  transformations  that 
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express  semantic  relationships  between  actual  schemas  is  quite  straightforward.  This  was  most 
graphically  illustrated  by  the  development  of  the  example  in  Section  20.2,  where  the  roles  of  the 
databases  were  reversed.  The  very  first  attempt  to  write  a  rule  relating  test  data  representation 
in  the  hospital  database  to  test  data  representation  in  the  clinic  database — simply  a  matter  of 
breaking  down  a  complex  relation  into  simpler  relations,  a  very  typical  sort  of  difference  in  data 
representation — was  sufficient  to  handle  this  example.  Although  our  experience  to  date  is  limited  to 
having  written  a  few  dozen  rules  relating  a  small  number  of  database  schemas,  it  strongly  suggests 
that  expressing  observed  semantic  relationships  in  our  formalism  will  not  be  difficult. 
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Chapter  22 

Transcript  of  Demonstration 


What  follows  is  a  transcript  of  the  output  produced  by  the  mediator  in  a  demonstration  run.  Text 
produced  by  the  mediator  is  in  teletype  font.  Text  entered  by  the  person  running  the  mediator 
is  in  italic  teletype.  Annotations  added  to  the  transcript  are  in  roman  italics. 

Start  the  mediator  by  typing  its  name  at  the  Unix  prompt. 

7,  merrimack 
Starting  Mediator  . . . 

The  mediator  is  waiting  for  input.  This  is  provided  by  creating  a 
file  that  contains  the  query  to  be  processed.  The  name  of  the  file  tells 
the  mediator  where  the  query  originated  and  what  other  databases  are 
candidates  for  providing  additional  data. 

Reading  SQL  from  file  "demo-in. sql"  ... 

“demo-in. sql”  indicates  that  the  query  is  from  the  Clinic 
database  and  that  the  Hospital  database  is  a  candidate. 

Abstract  syntax  tree  for  the  SQL  form  is: 

( : sql-tree 

( : select-items 

(: select-item  (: attribute  PATIENT.ALLERGY  PATIENT.ID) 

(: alias  HIL)) 

(: select-item  (: attribute  NIL  TEXT) 

(: alias  NIL))) 

(: relations 

( : relation  PATIENT. ALLERGY) 

(:relation  PATIENTS) 

(: relation  NOTES)) 

( : constraints 
( : constraint 

(: predicate  =) 

( : attribute  PATIENT. ALLERGY  PATIENT.ID) 
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(: attribute  PATIENTS  PATIENT_ID) ) 

( : constraint 

(predicate  =) 

(: attribute  PATIENT. ALLERGY  NOTE.ID) 

(.•attribute  NOTES  NOTE.ID) ) 

( : constraint 
(ipredicate  =) 

( : attribute  NIL  DRUG.NAME) 

(: literal-value  XD2001)) 

( : constraint 

(:predicate  >) 

(: attribute  NIL  TRANSACTION.TIME ) 

(: literal-value  01-JAN-94)))) 

Continue?  (Y  or  N) :  y 

When  run  in  interactive  mode,  the  mediator  pauses  at  the  end  of  each 
processing  stage  and  provides  the  option  of  continuing  the  demo  or  not. 

Filling  in  omitted  attributes  and  aliases  in  SQL  tree  . . . 
Completed  abstract  syntax  tree  for  the  SQL  form  is: 

( : sql-tree 

( : select-items 

(: select-item  (: attribute  PATIENT.ALLERGY  PATIENT.ID) 
(: alias  PATIENT.ID) ) 

(: select-item  (: attribute  NOTES  TEXT) 

(: alias  TEXT))) 

(: relations 

( relation  PATIENT. ALLERGY) 

(: relation  PATIENTS) 

( : relation  NOTES)) 

( : constraints 
( : constraint 

(: predicate  =) 

( : attribute  PATIENT.ALLERGY  PATIENT.ID) 

(: attribute  PATIENTS  PATIENT.ID)) 

( : constraint 

( predicate  =) 

(: attribute  PATIENT.ALLERGY  NOTE.ID) 

(: attribute  NOTES  NOTE.ID)) 

( : constraint 

(: predicate  =) 

( : attribute  PATIENT.ALLERGY  DRUG.NAHE) 

(: literal-value  XD2001)) 

( : constraint 

(:predicate  >) 

(: attribute  PATIENTS  TRANSACTION.TIME) 

(: literal-value  01-JAN-94) ) ) ) 

Continue?  (Y  or  N):  y 
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Converting  query  to  minimized  representation  . . . 

Minimized  abstract  syntax  tree  is: 

( : sql-tree 

( : select-items 

(: select-item  (: attribute  PATIENT.ALLERGY  PATIENT.ID) 

(: alias  PATIENT_ID) ) 

(: select-item  (: attribute  NOTES_TEXT  TEXT) 

(: alias  TEXT))) 

(relations 

(: relation  PATIENT. ALLERGY) 

(:relation  PATIENTS) 

(:relation  NOTES) 

(: relation  NOTES.TEXT) 

( : relation  PATIENTS_TRANSACTION_TIME) ) 

( : constraints 
( : constraint 

( predicate  =) 

( : attribute  PATIENT. ALLERGY  PATIENT.ID) 

(: attribute  PATIENTS  PATIENT.ID)) 

( : constraint 

(: predicate  =) 

( : attribute  PATIENT.ALLERGY  NOTE.ID) 

(: attribute  NOTES  NOTE.ID)) 

( : constraint 

(: predicate  =) 

(: attribute  PATIENT.ALLERGY  DRUG.NAME) 

(: literal-value  XD2001)) 

( : constraint 

(: predicate  >) 

( : attribute  PATIENTS.TRANSACTION.TIME  TRANSACTION.TIME) 
(: literal-value  01-JAN-94)) 

( : constraint 

(: predicate  =) 

(: attribute  NOTES  NOTE.ID) 

( : attribute  NOTES.TEXT  NOTE.ID)) 

( : constraint 

(: predicate  =) 

(: attribute  PATIENTS  PATIENT.ID) 

(: attribute  PATIENTS.TRANSACTION.TIME  PATIENT.ID)))) 
Continue?  (Y  or  N):  y 

Translating  SqL  query  to  logical  formula  . . . 

Logical  form  of  query  is: 

(E  ?patient_allergy .note.id) 

(E  ?patient_allergy .drug.name) 

( E  ?pat ients .transact ion.t ime . transact ion.t ime ) 

(E  ?notes. note.id) 
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(E  ?notes_text .note.id) 

(E  ?patients. patient.id) 

(E  ?patients_transaction_time. patient.id) 

\/{  /\{  Patient .Allergy ( 

?patient_allergy.patient_id, 

?pat ient.allergy . drug.name , 

?patient_allergy .note.id) , 

Patients (?patients .patient.id) , 

Notes (?notes .note.id) , 

Notes.Text (?notes_text . not e_id ,  ?notes_text .text) , 
Patients_Transaction_Time( 

?patients_transaction_time .patient.id, 
?patients_transaction_time . transaction.time) , 

?patient .allergy .patient.id  =  ?patients .patient.id, 
?patient_allergy .note.id  =  ?notes.note_id, 
?patient_allergy.drug_name  =  xd2001, 

?patients_transaction_time.transaction_time  >  01-jan-94, 

?notes .note.id  =  ?notes_text .note.id, 

?patients. patient.id  =  ?patients_transaction_time. patient.id}} 
where  the  f ree  variables  of  the  formula  are  associated  with  query  attribute 
aliases  as  follows: 

7PATIENT. ALLERGY. PATIENT.ID  <— >  PATIENT.ID 
?NOTES_TEXT . TEXT  <-->  TEXT 
Continue?  (Y  or  N):  y 

Simplifying  logical  formula  . . . 

Simplified  formula  is: 

(E  ?patients .patient.id) 

(E  ?patient_allergy .note.id) 

(E  ?notes .note.id) 

(E  ?patient .allergy. drug.name) 

(E  ?patients_transaction_time . transaction.time) 

\/{  /\{  Patient. Allergy ( 

?patient_allergy .patient.id, 

?patient_allergy .drug.name , 

?patient_allergy .note.id) , 

Notes_Text(?notes .note.id,  ?notes_text . text) , 

Pat ient  s.Transact ion.Time ( 

?patients .patient.id, 

?patients_transaction_time . transaction.time) , 

?patient_allergy .patient.id  =  ?patients .patient.id, 
Tpatient.allergy .note.id  =  ?notes .note.id, 

?patient_allergy .drug.name  =  xd2001, 

?patients_transaction_time. transaction.time  >  01-jan-94}} 
Continue?  (Y  or  N):  y 

Attempting  to  derive  queries  on  remote  databases  . . . 
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Attempting  to  derive  query  on  HOSPITAL_DATABASE  . . . 

Succeeded! 

The  rules  that  were  used  in  this  case  are 

Patients(p )  — >  Admissions(p,  ? admissions. admission-time), 

Patient-Allergy(p,  d,  rii)  A  Notes.Text{ni,  i)  A  Hi  =  ri2 

— *  Drug-Allergy-Text(p,  d,  x), 

Patients-Transaction.Time(p,  t% )  A  <2  <  <1  — *  Admissions(p,  t\)  A  <2  <  ^1  > 
and 

xd2001  — +  druggo. 

Logical  form  of  derived  query  is: 

(E  ?drug_allergy_text.drug_id) 

(E  ? admissions .patient_id) 

(E  ?admissions.admission_time) 

\/{  /\{  ?drug_allergy_text .patient.id  =  ?admissions.patient_id, 
?drug_allergy_text .drug_id  =  druggo, 

Drug_Allergy_Text ( 

?drug_allergy_text .patient_id, 

?drug_allergy_text . drug_id , 

?drug_allergy_t ext. text) , 

Admissions (?admissions. patient.id,  ? admissions . admission_time) , 
?admissions.admission_time  >  01-jan-94» 
where  the  variables  in  the  two  formulas  are  associated  as  follows: 

?PATIENT_ALLERGY . PATIENT_ID  <-->  ?DRUG_ ALLERG Y.TEXT. PATIENT. ID 
?P ATIENT. ALLERGY. DRUG.NAME  <— >  ?DRUG_ALLERGY_TEXT . DRUG.ID 
?NOTES_TEXT . TEXT  <-->  ?DRUG_ALLERGY_TEXT . TEXT 
’PATIENTS . PATIENT.ID  <-->  ?ADHISSIQNS.PATIENT_ID 
?PATIENTS_TRANSACTION_TIME . TRANS ACTION.TIME 
<— >  ? ADMISSIONS . ADHISSION.TIME 
Continue?  (Y  or  N):  y 

Translating  logical  formula  to  SQL  query  . . . 

( : sql-tree 

( : select-items 

(: select-item  (: attribute  DRUG_ALLERGY_TEXT  TEXT) 

(: alias  TEXT)) 

(: select-item  (: attribute  DRUG_ALLERGY_TEXT  PATIENT.ID) 

(: alias  PATIENT.ID) ) ) 

(: relations 

(: relation  DRUG_ALLERGY_TEXT) 

(: relation  ADMISSIONS)) 

( : constraints 
( : constraint 

(:predicate  =) 

( : attribute  DRUG_ALLERGY_TEXT  PATIENT. ID) 

(: attribute  ADMISSIONS  PATIENT.ID) ) 

( : constraint 
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( : predicate  =) 

(: attribute  DRUG_ALLERGY_TEXT  DRUG_ID) 

(: literal-value  DRUGGO)) 

( : constraint 

(: predicate  >) 

(: attribute  ADMISSIONS  ADMISSION_TIME) 

(: literal-value  01-JAN-94) ) ) ) 

Continue?  (Y  or  N) :  y 

Converting  from  minimized  representation  to  actual  representation 
( : sql-tree 

( : select-items 

(: select-item  (: attribute  DRUG_ALLERGY  TEXT) 

(: alias  TEXT)) 

(: select-item  (:attribute  DRUG_ALLERGY  PATIENT_ID) 

(: alias  PATIENT.ID))) 

(: relations 

(: relation  DRUG_ALLERGY ) 

( : relation  ADMISSIONS)) 

( : constraints 
( : constraint 

(:predicate  =) 

( : attribute  DRUG_ALLERGY  PATIENT_ID) 

( : attribute  ADMISSIONS  PATIENT_ID)) 

( : constraint 

(: predicate  =) 

(attribute  DRUG.ALLERGY  DRUG_ID) 

(: literal-value  DRUGGO)) 

( : constraint 

(: predicate  >) 

( : attribute  ADMISSIONS  ADMISSION.TIME) 

(: literal-value  01-JAN-94) ) ) ) 

Continue?  (Y  or  N):  y 


Writing  SQL  query  on  remote  database  to  file  "demo-out . sql" .. . 
Query  has  been  written  to  file  "demo-out . sql" 

This  query  is  passed  to  the  Hospital’s  database  system  for  processing. 

The  resulting  table  will  be  written  to  the  file  “demo-in.tbl” . 

Continue?  (Y  or  N):  y 

Converting  table  in  file  "demo-in.tbl"  for  use  as  response  to 
original  query 

Will  write  result  to  "demo-out .tbl"  ... 

Table  read  from  file  "demo-in.tbl",  and  file  deleted 
Converted  table  has  been  written  to  file  "demo-out .tbl" 
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This  completes  the  first  phase  of  the  demonstration.  The  additional 
data  in  the  converted  table  can  now  be  presented  to  the  user  who  issues 
the  original  query. 

Reading  SQL  from  file  "demo-2-in. sql"  ... 

* demo-2-in.sql ”  indicates  a  query  from  the  Hospital,  and  that  the 
Clinic  database  is  a  candidate  for  additional  data. 

Abstract  syntax  tree  for  the  SQL  form  is: 

( :sql-tree 

( : select-items 

(: select-item  (: attribute  NIL  RUN_DATE) 

(: alias  NIL))) 

(: relations 

( : r elat ion  TEST_ORDERED ) ) 

( : constraints 
( : constraint 

(: predicate  =) 

( : attribute  NIL  PATIENT_ID) 

(: literal-value  123-45-6789)) 

( : constraint 

(: predicate  =) 

( : attribute  NIL  ORDERABLE_TEST_NAME) 

(: literal-value  PLATELET  COUNT)))) 

Continue?  (Y  or  N):  y 

Filling  in  omitted  attributes  and  aliases  in  SQL  tree  . . . 
Completed  abstract  syntax  tree  for  the  SQL  form  is: 

( : sql -tree 

( : select-items 

(: select-item  (: attribute  TEST_ORDERED  RUN_DATE) 

(: alias  RUN.DATE))) 

(relations 

(: relation  TEST.ORDERED) ) 

( : constraints 
( : constraint 

(: predicate  =) 

( : attribute  TEST.ORDERED  PATIENT.ID) 

(: literal-value  123-45-6789)) 

( : constraint 

(:predicate  =) 

(: attribute  TEST.ORDERED  ORDERABLE_TEST_NAME) 

(: literal-value  PLATELET  COUNT)))) 

Continue?  (Y  or  N):  y 

Converting  query  to  minimized  representation  . . . 

Minimized  abstract  syntax  tree  is: 
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( : sql-tree 

(:  select-items 

(: select-item  (: attribute  TEST_ORDERED_RUN_DATE  RUN.DATE) 

(: alias  RUN.DATE))) 

(: relations 

(:relation  TEST_ORDERED) 

( : relation  TEST_ORDERED_RUN_DATE) ) 

( : constraints 
( : constraint 

(:predicate  =) 

(: attribute  TEST_ORDERED  PATIENT.ID) 

(: literal-value  123-45-6789)) 

( : constraint 

(: predicate  =) 

(: attribute  TEST_ORDERED  ORDERABLE.TEST.NAME) 

(: literal-value  PLATELET  COUNT)) 

( : constraint 

(predicate  =) 

(: attribute  TEST.ORDERED  TEST_NUMBER) 

(attribute  TEST_ORDERED_RUNJDATE  TEST_NUMBER) ) 

( : constraint 

(predicate  =) 

( : attribute  TEST.ORDERED  PATIENT.ID) 

( : attribute  TEST_ORDERED_RUN_DATE  PATIENT. ID)) 

( : constraint 

(predicate  =) 

( : attribute  TEST.ORDERED  PHYSICIAN.ID) 

(: attribute  TEST.ORDERED.RUN.DATE  PHYSICIAN.ID)) 

( : constraint 

(rpredicate  =) 

( : attribute  TEST.ORDERED  ORDERABLE.TEST.NAME ) 

( : attribute  TEST.ORDERED.RUN.DATE  ORDERABLE.TEST.NAME)))) 
Continue?  (Y  or  N):  y 

Translating  SQL  query  to  logical  formula  . . . 

Logical  form  of  query  is: 

(E  ?test_ordered.test_number) 

(E  ?test_ordered_run_date . test.number) 

(E  ?test_ordered.patient_id) 

(E  ?test_ordered_run_date.patient_id) 

(E  ?test_ordered.physician_id) 

(E  ?test_ordered_run_date.physician_id) 

(E  ?test_ordered.orderable_test_name) 

(E  ?test_ordered_run_date.orderable_test_name) 

\/{  /\{  Test_Ordered( 

?test_ordered . test.number , 

?test_ordered.patient_id, 

?test_ordered . pbysician.id , 
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?test_ordered. orderable.test.name) , 

Test_Ordered_Run_Date( 

?t  e  s t_order ed_run_dat  e . t  e  s t .number , 

7test_ordered_run.dat  e . pat i ent  _ id , 

?t est_ordered_run.dat e. physician. id, 

?test_ordered_run_date .orderable.test.name, 
?test_ordered_mn_date.run_date) , 

?test_ordered.patient_id  =  123-45-6789, 

?test_ordered. orderable.test.name  =  platelet  count, 
?test_ordered.test_number  =  ?test_ordered_run_date.test_number, 
?test_ordered.patient_id  =  ?test_ordered_run_date.patient_id, 
?test_ordered.physician_id 

=  ?test_ordered_run_date .physician.id, 

?test_ordered. orderable.test.name 
=  ?test_ordered_run_date . orderable.test.name}} 
where  the  free  variables  of  the  formula  are  associated  with  query  attribute 
aliases  as  follows: 

?TEST_ORDERED_RUH_DATE . RUN.DATE  <— >  RUN.DATE 
Continue?  (Y  or  N):  y 

Simplifying  logical  formula  . . . 

Simplified  formula  is: 

(E  ?test_ordered.test_number) 

(E  ?test_ordered. physician.id) 

(E  ?test_ordered.patient_id) 

(E  ?test_ordered. orderable.test.name) 

\/{  /\{  Test_Ordered_Run_Date( 

?t est.ordered . test.number , 

?t est.ordered . pat ient.id , 

?test_ordered . physician.id , 

?test_ordered . orderable.test.name , 

?test_ordered_run_date .nm.date) , 

?test_ordered.patient_id  =  123-45-6789, 

Ttestiordered. orderable.test.name  =  platelet  count}} 

Continue?  (Y  or  N):  y 

Attempting  to  derive  queries  on  remote  databases  . . . 

Attempting  to  derive  query  on  CLINIC.DATABASE  . . . 

Succeeded! 

The  rule  that  was  used  in  this  case  is 
Test. Ordered  Jlun.Date{n ,  p,  d,  o,  t) 

— ►  Lab.Result(p,  r»i)  A  Tesis.TestJTame(n2,  o)  A  ni  =  ri2- 

Logical  form  of  derived  query  is: 

(E  ?lab_result .patient.id) 

(E  ?tests_test_name.test_name) 
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(E  ?tests_test_name.test_number) 

(E  ?lab_result.test_number) 

\/{  /\{  ?lab_result .patient.id  =  123-45-6789, 

?tests_test_name .test.name  =  platelet  count, 

Lab_Result( 

?lab_result .patient_id, 

?lab_result . test.nurober , 

?lab_result.transaction_time) , 

Tests_Test_Name( 

?tests_test_name.test_number, 

?tests_test_name.test_name) , 

?tests_test_name.test_number  =  ?lab_result.test_number}} 
where  the  variables  in  the  two  formulas  are  associated  as  follows: 
?TEST_ORDERED . PATIENT_ID  <— >  ?LAB_RESULT . PATIENT_ID 
?LAB_RESULT . TEST_NUHBER  <-->  ?LAB_RESULT . TEST_ NUMBER 
?TEST_ORDERED_RUN_DATE . RUN.DATE  <-->  ?LAB_RESULT . TRANSACTIQN.TIME 
7TESTS . TEST_NUHBER  <-->  ?TESTS_TEST_NAME . TEST_NUMBER 
?TEST_ORDERED . ORDERABLE_TEST_NAME  <— >  ?TESTS_TEST_NAME . TEST_NAME 
Continue?  (Y  or  N):  y 

Translating  logical  formula  to  SQL  query  ... 

( : sql-tree 

( : select-items 

(: select-item  (: attribute  LAB_RESULT  TRANSACTION_TIME) 

(: alias  RUN_DATE) ) ) 

(: relations 

(: relation  LAB_RESULT) 

(.•relation  TESTS_TEST_NAHE) ) 

( : constraints 
( : constraint 

(: predicate  =) 

( : attribute  LAB_RESULT  PATIENT_ID) 

(: literal-value  123-45-6789)) 

( : constraint 

(:predicate  =) 

( : attribute  TESTS_TEST_NAME  TEST.NAHE) 

(: literal-value  PLATELET  COUNT)) 

( : constraint 

(: predicate  =) 

(: attribute  TESTS_TEST_NAME  TEST_NUHBER) 

( : attribute  LAB .RESULT  TEST.NUHBER) ) ) ) 

Continue?  (Y  or  N):  y 

Converting  from  minimized  representation  to  actual  representation  . . . 

( : sql-tree 

( : select-items 

(: select-item  (: attribute  LAB.RESULT  TRANSACTION.TIHE) 

(: alias  RUN.DATE) ) ) 
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(relations 

(: relation  LAB .RESULT) 

(: relation  TESTS)) 

(: constraints 
( : constraint 

^predicate  =) 

(: attribute  LAB.RESULT  PATIENT.ID) 

( : literal-value  123-45-6789)) 

( : constraint 

(: predicate  =) 

(: attribute  TESTS  TEST.NAME) 

( :literal-value  PLATELET  COUNT)) 

( : constraint 

(: predicate  =) 

( : attribute  TESTS  TEST.NUMBER) 

(: attribute  LAB.RESULT  TEST.NUMBER) ) ) ) 
Continue?  (Y  or  N):  y 


Writing  SQL  query  on  remote  database  to  file  "demo-2-out . sql" . . . 
Query  has  been  written  to  file  "demo-2-out . sql" 

Continue?  (Y  or  N):  y 

Converting  table  in  file  "demo-2-in.tbl"  for  use  as  response  to 
original  query 

Will  write  result  to  "demo-2-out .tbl"  ... 

Table  read  from  file  "demo-2-in.tbl" ,  and  file  deleted 
Converted  table  has  been  written  to  file  "demo-2-out . tbl" 

This  completes  the  second  phase  of  the  demonstration. 

Reading  SQL  from  file  "demo-3-in. sql"  ... 

This  indicates  another  query  from  the  Clinic,  but,  in  this  case,  the 
candidate  sources  of  data  are  two  databases  at  the  Hospital  which  jointly 
contain  the  same  data  in  the  Hospital  database  of  the  first  example.  In 
fact,  the  very  same  query  that  was  used  in  the  first  example  is  reused  here. 

Abstract  syntax  tree  for  the  SQL  form  is: 

( : sql-tree 

( : select-items 

(: select-item  (: attribute  PATIENT. ALLERGY  PATIENT.ID) 

(: alias  NIL)) 

(: select-item  (: attribute  NIL  TEXT) 

(: alias  NIL))) 

(  relations 

( : r elat ion  PATIENT. ALLERGY ) 

(: relation  PATIENTS) 
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(: relation  NOTES)) 

( : constraints 
( : constraint 

(predicate  =) 

(: attribute  PATIENT.ALLERGY  PATIENT_ID) 

(: attribute  PATIENTS  PATIENT_ID) ) 

( : constraint 

(predicate  =) 

(: attribute  PATIENT. ALLERGY  NOTE. ID) 

(: attribute  NOTES  NOTE.ID) ) 

( : constraint 

(: predicate  =) 

(attribute  NIL  DRUG.NAME) 

(: literal-value  XD2001)) 

( : constraint 

(: predicate  <) 

(: literal-value  01-JAN-94) 

(: attribute  NIL  TRANSACTION.TIME ) ) ) ) 

Continue?  (Y  or  N):  y 

Filling  in  omitted  attributes  and  aliases  in  SQL  tree  . . . 
Completed  abstract  syntax  tree  for  the  SQL  form  is: 

( : sql-tree 

( : select-items 

(: select-item  (: attribute  PATIENT.ALLERGY  PATIENT.ID) 
(: alias  PATIENT.ID)) 

(: select-item  (: attribute  NOTES  TEXT) 

(: alias  TEXT))) 

(: relations 

(: relation  PATIENT.ALLERGY) 

(: relation  PATIENTS) 

(: relation  NOTES)) 

( : constraints 
( : constraint 

Opredicate  =) 

(: attribute  PATIENT.ALLERGY  PATIENT.ID) 

( : attribute  PATIENTS  PATIENT.ID)) 

( : constraint 

(: predicate  =) 

( : attribute  PATIENT.ALLERGY  NOTE.ID) 

(: attribute  NOTES  NOTE.ID)) 

( : constraint 

Opredicate  =) 

(: attribute  PATIENT.ALLERGY  DRUG.NAME) 

(: literal- value  XD2001)) 

( : constraint 

(: predicate  <) 

(: literal-value  01-JAN-94) 
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(: attribute  PATIENTS  TRANSACTION.TIME) ) ) ) 

Continue?  (Y  or  N):  y 

Converting  query  to  minimized  representation  . . . 

Minimized  abstract  syntax  tree  is: 

( : sql-tree 

( : select-items 

(: select-item  (: attribute  PATIENT_ALLERGY  PATIENT.ID) 

(: alias  PATIENT_ID) ) 

(: select-item  (: attribute  NOTES_TEXT  TEXT) 

(: alias  TEXT))) 

(: relations 

(: relation  PATIENT_ALLERGY ) 

( : relation  PATIENTS) 

(: relation  NOTES) 

(: relation  NOTES.TEXT) 

( relation  PATIENTS_TRANSACTION_TIME) ) 

( : constraints 
( : constraint 

(:predicate  =) 

( : attribute  PATIENT. ALLERGY  PATIENT, ID) 

(: attribute  PATIENTS  PATIENT.ID)) 

( : constraint 

(: predicate  =) 

( : attribute  PATIENT.ALLERGY  NOTE.ID) 

(: attribute  NOTES  NOTE_ID) ) 

( : constraint 

(: predicate  =) 

( : attribute  PATIENT_ALLERGY  DRUG_NAME) 

(: literal-value  XD2001)) 

( : constraint 

(:predicate  <) 

(: literal-value  01-JAN-94) 

(: attribute  PATIENTS_TRANSACTION_TIME  TRANSACTION.TIME)) 
( : constraint 

(: predicate  =) 

( : attribute  NOTES  N0TE_ID) 

(: attribute  NOTES.TEXT  NOTE. ID)) 

( : constraint 

^predicate  =) 

( : attribute  PATIENTS  PATIENT.ID) 

(: attribute  PATIENTS.TRANSACTION.TIME  PATIENT.ID)))) 
Continue?  (Y  or  N):  y 

Translating  SqL  query  to  logical  formula  . . . 

Logical  form  of  query  is: 

(E  ?patient_allergy .note.id) 

(E  ?patient_allergy .drug.name) 
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(E  ?patients_transaction_time.transaction_time) 

(E  ?notes .note_id) 

(E  ?notes_text .note_id) 

(E  ?patients.patient_id) 

(E  ?patients_transaction_time . patient_id) 

\/{  /\{  Patient_Allergy( 

?pat ient_allergy . patient_id , 

?patient_allergy . drug_name , 

?patient_allergy .note_id) , 

Patients (?patients ,patient_id) , 

Notes (?notes .note_id) , 

Notes_Text(?notes_text .note.id,  ?notes_text .text) , 
Patients_Transaction_Time( 

?patients_transaction_time .patient_id, 
?patients_transaction_time.transaction_time) , 
?patient_allergy.patient_id  =  ?patients .patient.id, 
?patient_allergy.note_id  =  ?notes.note_id, 

?patient_allergy .drug_name  =  xd2001, 

Oi-jan-94  <  ?patients_transaction_time . transact ion_time, 

?notes ,note_id  =  ?notes_text .note_id, 

?patients ,patient_id  =  ?patients_transaction_time.patient_id}} 
where  the  free  variables  of  the  formula  are  associated  with  query  attribute 
aliases  as  follows: 

?PATIENT_ALLERGY.PATIENT_ID  <-- >  PATIENT_ID 
?NOTES_TEXT . TEXT  <— >  TEXT 
Continue?  (Y  or  N):  y 

Simplifying  logical  formula  . . . 

Simplified  formula  is: 

(E  ?patients ,patient_id) 

(E  ?patient_allergy .note_id) 

(E  ?notes .note_id) 

(E  ?patient_allergy .drug_name) 

(E  ?patients_transaction_time. transaction_time) 

\/{  /\{  Patient_Allergy ( 

?patient_allergy . pat ient_id , 

?patient_allergy . drug_name , 

?patient_allergy .note_id) , 

Notes_Text(?notes .note_id,  ?notes_text . text) , 
Patients_Transaction_Time( 

?patients . patient_id, 

?patients_transaction_time . transact ion_time) , 

?patient_allergy . patient_id  =  ?patients .patient_id, 
?patient_allergy ,note_id  =  ?notes .note_id, 

?patient_allergy .drug_name  =  xd2001, 

01-jan-94  <  ?patients_transaction_time .transaction_time» 
Continue?  (Y  or  N):  y 
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Attempting  to  derive  queries  on  remote  databases  . . . 

Attempting  to  derive  query  on  H0SPITAL_DATABASE_1  . . . 

Partially  succeeded. 

Here  is  ihe  first  difference  between  the  two  examples.  The  subset  of 
the  full  Hospital  database  that  is  called  “ Hospital  database  1”  does  not 
contain  enough  information  that  any  query  on  it  provides  data  relevant  to 
the  original  query.  Thus,  the  attempt  to  transform  the  query  into  a  query 
on  the  partial  Hospital  database  only  partially  succeeds — some  of  the 
relations  in  the  Clinic  database  are  still  mentioned.  The  rules  that  were 
used  in  this  case  are 

Patients(p)  — »  Admissions(p,  ? admissions,  admission-time), 
Patients-Transaction-Time(p,  t\)  A  t-z  <  ti  — »  Admissions(p,  ti)  A  t?  <  ti 
and 

xd2001  — +  druggo. 

Logical  form  of  derived  query  is: 

(E  ?patient_allergy .note_id) 

(E  ?notes ,note_id) 

(E  ?patient .allergy . drug_name) 

(E  ?admis s ions .pat ient_id) 

(E  ?admissions.admission_time) 

\/{  /\{  Patient .Allergy ( 

?patient_allergy .patient.id, 

?patient_allergy . drug.name , 

?patient .allergy .note.id) , 

Notes_Text(?notes .note.id,  ?notes_text .text) , 

?patient_allergy .patient.id  =  ?admissions .patient.id, 

?patient .allergy .note.id  =  ?notes .note.id, 

?patient_allergy .drug.name  =  druggo, 

Admissions (?admissions .patient.id,  ?admissions.admission_time) , 
01-jan-94  <  ?admissions . admission.time}} 
where  the  variables  in  the  two  formulas  are  associated  as  follows: 
?PATIENTS. PATIENT.ID  <-->  ? ADMISSIONS . PATIENT.ID 
?PATIENTS_TRANSACTION_TIME . TRANSACTION.TIME 
<— >  ? ADMISSIONS. ADMISSION.TIME 
Continue?  (Y  or  N):  y 

Attempting  to  derive  query  on  combination  of  HOSPITAL.DATABASE.i 
and  H0SPITAL.DATABASE.2  . . . 

Succeeded ! 

Applying  the  rules  for  transforming  a  query  on  the  Clinic  database  into  a 
query  on  Hospital  database  2,  the  other  part  of  the  full  Hospital  database 
succeeded  in  eliminating  the  remaining  relations  of  the  Clinic  database.  The 
additional  rule  that  was  used  in  this  case  is 

Patient-Allergy(p ,  d,  nj)  A  Notes-Text(n 2,  x)  A  nj  =  n 2 

— *  Drug -Allergy -Text{p,  d,  x). 
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Logical  form  of  derived  query  is: 

(E  ? admissions .patient_id) 

(E  ?admissions.admission_time) 

(E  ?drug_allergy_text .drug.id) 

\/{  /\{  ?drug_allergy_text. patient.id  =  ?admissions .patient_id, 
?drug_allergy_text.drug_id  =  druggo, 

Admissions (?admissions .patient. id,  ? admissions. admission_time) , 
01-jan-94  <  ?admissions .admission_time, 

Drug_Allergy_Text ( 

?drug_allergy_text .patient_id, 

?drug_allergy_text . drug_id, 

?drug_allergy_t  ext . t  ext ) » 

where  the  variables  in  the  two  formulas  are  associated  as  follows: 

’PATIENT. ALLERGY. PATIENT.ID  <— >  ?DRUG_ALLERGY_TEXT . PATIENT.ID 
?PATIENT_ALLERGY . DRUG.NAME  <-->  ?DRUG_ALLERGY_TEXT . DRUG.ID 
?NOTES_TEXT . TEXT  <— >  ?DRUG_ ALLERG Y.TEXT . TEXT 
Continue?  (Y  or  N) :  y 

Splitting  logical  query  over  combined  databases  into  multiple  queries  over 
individual  databases  plus  ‘glue’... 

The  logical  formula  has  been  split  into  the  following  formulas: 

(E  ?admissions . admission. time) 

A{  Admissions (?admissions .patient.id,  ?admissions . admission.time) , 
01-jan-94  <  ?admissions . admission.time}} 

(E  ?drug_allergy_text.drug_id) 

\/{  /\{  Drug_Allergy_Text( 

?drug_allergy_text .patient.id, 

?drug_allergy_text .drug.id, 

?drug_allergy_text.text) , 

?drug_allergy .text. drug. id  =  druggo}} 

plus  the  following  ‘glue’  equations: 

?drug_allergy_text. patient.id  =  ?admissions .patient.id 

Continue?  (Y  or  N):  y 

Translating  logical  formulas  to  SQL  queries... 

( : sql-tree 

(: select-items 

(: select-item  (.-attribute  ADMISSIONS  PATIENT.ID) 

(: alias  NIL))) 

( : relations 

(: relation  ADMISSIONS)) 
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( : constraints 
( : constraint 

Opredicate  <) 

( :literal-valne  01-JAN-94) 

(: attribute  ADMISSIONS  ADMISSION.TIME) ) ) ) 

( : sql-tree 

(: select-items 

(: select-item  (: attribute  DRUG_ALLERGY_TEXT  TEXT) 

(: alias  TEXT)) 

(: select-item  (: attribute  DRUG_ALLERGY_TEXT  PATIENT_ID) 

(: alias  PATIENT.ID))) 

(: relations 

( : relation  DRUG_ALLERGY_TEXT) ) 

( : constraints 
( : constraint 

Opredicate  =) 

( : attribute  DRUG_ALLERGY_TEXT  DRUG. ID) 

(: literal-value  DRUGGO)))) 

Continue?  (Y  or  N):  y 

Converting  from  minimized  representation  to  actual  representation 

( : sql-tree 

( : select-items 

(: select-item  (: attribute  ADMISSIONS  PATIENT.ID) 

(: alias  NIL))) 

(: relations 

(: relation  ADMISSIONS)) 

( : constraints 
( : constraint 
Opredicate  <) 

( :literal-value  01-JAN-94) 

( : attribute  ADMISSIONS  ADMISSION.TIME) ) ) ) 

( : sql-tree 

( : select-items 

(: select-item  (: attribute  DRUG.ALLERGY  TEXT) 

(: alias  TEXT)) 

(: select-item  (: attribute  DRUG.ALLERGY  PATIENT.ID) 

(: alias  PATIENT.ID))) 

(: relations 

(•.relation  DRUG.ALLERGY)) 

( : constraints 
( : constraint 

(: predicate  =) 

( : attribute  DRUG.ALLERGY  DRUG.ID) 
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(: literal-value  DRUGGO)))) 


Continue?  (Y  or  N):  y 


Writing  SQL  queries  on  remote  databases  to  files  "demo-3-out-l . sql" 
and  "demo-3-out-2 . sql" . . . 

Query  on  HOSPITAL_DATABASE_i  has  been  written  to  file  "demo-3-out-i . sql" 
Query  on  H0SPITAL_DATABASE_2  has  been  written  to  file  "demo-3-out-2.sql" 

Continue?  (Y  or  N):  y 

Joining  tables  in  files  "demo-3-in-l .tbl"  and  "demo-3-in-2.tbl"  for  use  as 
response  to  original  query 
Will  write  result  to  "demo-3-out .tbl"  ... 

Table  read  from  file  "demo-3-in-l .tbl" ,  and  file  deleted 
Table  read  from  file  "demo-3-in-2 .tbl" ,  and  file  deleted 
Joined  table  has  been  written  to  file  "demo-3-out .tbl" 

Now,  the  fourth  and  final  phase  of  the  demonstration. 

Reading  SQL  from  file  "demo-4-in. sql"  ... 

This  time,  the  query  is  from  a  slightly  modified  Clinic  database  that 
uses  specially-generated  patient  IDs  rather  than  SSNs.  The  correlation 
between  IDs  and  SSNs  is  stored  in  another  restricted-access  Clinic  database,  which 
the  issuer  of  this  query  has  permission  to  access. 

Abstract  syntax  tree  for  the  SQL  form  is : 

( : sql-tree 

( : select-items 

(: select-item  (: attribute  PATIENT. ALLERGY  DRUG.NAME) 

(: alias  NIL)) 

(: select-item  (: attribute  NOTES  TEXT) 

(: alias  NIL))) 

(: relations 

(:relation  NOTES) 

( : relation  PATIENT. ALLERGY ) ) 

( : constraints 
( : constraint 

(rpredicate  =) 

(rattribute  NOTES  NOTE.ID) 

(: attribute  PATIENT.ALLERGY  NOTE.ID)) 

( : constraint 

(:predicate  =) 

(rattribute  PATIENT.ALLERGY  PATIENT. ID) 

(.•literal-value  Plllil)))) 


112 


Continue?  (Y  or  N):  y 

Filling  in  omitted  attributes  and  aliases  in  SQL  tree  . . . 
Completed  abstract  syntax  tree  for  the  SQL  form  is: 

( : sql-tree 

( : select-items 

(: select-item  (: attribute  PATIENT.ALLERGY  DRUG.NAME) 
(: alias  DRUG_NAME)) 

(: select-item  (: attribute  NOTES  TEXT) 

(: alias  TEXT))) 

(: relations 

(: relation  NOTES) 

( : relation  PATIENT. ALLERGY) ) 

( : constraints 
( : constraint 

(: predicate  =) 

( : attribute  NOTES  NOTE.ID) 

(: attribute  PATIENT. ALLERGY  NOTE.ID) ) 

( : constraint 

(: predicate  =) 

( : attribute  PATIENT.ALLERGY  PATIENT.ID) 

(: literal-value  Plllll)))) 

Continue?  (Y  or  N):  y 

Converting  query  to  minimized  representation  . . . 

Minimized  abstract  syntax  tree  is: 

( : sql-tree 

( : select-items 

(: select-item  (: attribute  PATIENT.ALLERGY  DRUG.NAME) 
(: alias  DRUG.NAME)) 

(: select-item  (: attribute  NOTES.TEXT  TEXT) 

(: alias  TEXT))) 

(relations 

(:relation  NOTES) 

(: relation  PATIENT.ALLERGY) 

( : r elat ion  NOTES.TEXT ) ) 

( : constraints 
( : constraint 

(: predicate  =) 

( : attribute  NOTES  NOTE.ID) 

(: attribute  PATIENT.ALLERGY  NOTE.ID)) 

( : constraint 

(:predicate  =) 

( : attribute  PATIENT.ALLERGY  PATIENT.ID) 

(: literal-value  Plllll)) 

( : constraint 

(:predicate  =) 

( : attribute  NOTES  NOTE.ID) 

(: attribute  NOTES.TEXT  NOTE.ID)))) 

Continue?  (Y  or  N):  y 
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Translating  SQL  query  to  logical  formula  . . . 

Logical  form  of  query  is: 

(E  ?patient_allergy .note_id) 

(E  ?patient_allergy .patient_id) 

(E  ?notes .note_id) 

(E  ?notes_text .note_id) 

\/{  /\{  Notes(?notes .note_id) , 

Patient_Allergy( 

?patient_allergy .patient_id, 

?patient_allergy . drug_name , 

?patient_allergy .note_id) , 

Notes_Text(?notes_text .note_id,  ?notes_text .text) , 

?notes .note_id  =  ?patient_allergy .note_id, 

?patient_allergy ,patient_id  =  plllll, 

?notes ,note_id  =  ?notes_text .note_id}} 
where  the  free  variables  of  the  formula  are  associated  with  query  attribute 
aliases  as  follows: 

?PATIENT_ALLERGY . DRUG_NAME  <— >  DRUG_NAME 
?NOTES_TEXT .  TEXT  <— >  TEXT 
Continue?  (Y  or  N):  y 
Simplifying  logical  formula  . . . 

Simplified  formula  is: 

(E  ?notes .note.id) 

(E  ?patient_allergy ,note_id) 

(E  ?patient_allergy .patient_id) 

\/{  /\{  Patient_All ergy ( 

?patient_allergy .patient_id, 

?patient_allergy . drug_name , 

?patient_allergy .note_id) , 

Notes_Text (?notes . note_id,  ?notes_text .text ) , 

?notes .note_id  =  ?patient_allergy.note_id, 

?patient_allergy .patient_id  =  plllll}} 

Continue?  (Y  or  N):  y 

Attempting  to  derive  queries  on  remote  databases  . . . 

Attempting  to  derive  query  on  HOSPITAL_DATABASE  . . . 

Writing  query  to  file  "demo-4-out-l . sql"  ... 

In  order  to  obtain  a  query  on  Hospital  database,  the  ID  ‘Plllll’  must  be 
transformed  into  a  SSN.  So  the  mediator  issues  an  auxiliary  query  to  obtain 
this  patient’s  SSN.  The  rule  being  used  that  causes  the  query  to  be  issued  is 

?paiient. allergy. patient Jd  =  k  — *•  ? patient-allergy. patient. id  =  k'  (k  a  constant), 
where  k'  ts  the  result  returned  when  the  query 
SELECT  SSN 

FROM  PERSONAL_DATA 
WHERE  PATIENT.ID  =  k; 
ts  run  on  the  restricted- access  Clinic  database. 

Query  written  to  file  "demo-4-out-l . sql" 
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Table  read  from  file  "demo-4-in-i .tbl" ,  and  file  deleted 
The  mediator  now  knows  that  Plllll’s  SSN  is  123-45-6789. 

Succeeded! 

Logical  form  of  derived  query  is: 

(E  ?drug_allergy_text .patient_id) 

\/{  /\{  ?drug_allergy_text .patient_id  =  123-45-6789, 
Drug_Allergy_Text ( 

?drug_allergy_text .patient_id, 

?drug_ allergy _text . drug_id , 

?drug_allergy _t  ext . t  ext ) » 

where  the  variables  in  the  two  formulas  are  associated  as  follows: 

?PATIENT_ALLERGY . PATIENT_ID  <— >  ?DRUG_ALLERGY_TEXT . PATIENT_ID 
?PATIENT_ ALLERGY. DRUG_NAME  <-->  ?DRUG_ALLERGY_TEXT.DRUG_ID 
?NOTES_TEXT . TEXT  <— >  ?DRUG_ALLERGY_TEXT . TEXT 
Continue?  (Y  or  N):  y 

Translating  logical  formula  to  SQL  query  . . . 

( : sql-tree 

( : select-items 

(: select-item  (: attribute  DRUG_ALLERGY_TEXT  TEXT) 

(: alias  TEXT)) 

(: select-item  (: attribute  DRUG_ALLERGY_TEXT  DRUG_ID) 

(: alias  DRUG_NAME) ) ) 

(: relations 

( : relation  DRUG_ALLERGY_TEXT ) ) 

( : constraints 
( : constraint 

(:predicate  =) 

( : attribute  DRUG_ALLERGY_TEXT  PATIEHT.ID) 

(: literal-value  123-45-6789)))) 

Continue?  (Y  or  H) :  y 

Converting  from  minimized  representation  to  actual  representation  . . . 
( : sql-tree 

(: select-items 

(: select-item  (: attribute  DRUG_ALLERGY  TEXT) 

(: alias  TEXT)) 

(:select-item  (:attribute  DRUG_ALLERGY  DRUG_ID) 

(: alias  DRUG.NAHE))) 

(: relations 

( : relation  DRUG.ALLERGY) ) 

(constraints 
( : constraint 

(: predicate  =) 

( : attribute  DRUG.ALLERGY  PATIENT_ID) 

(: literal-value  123-45-6789)))) 

Continue?  (Y  or  H):  y 

Writing  SQL  query  on  remote  database  to  file  "demo-4-out-2.sql". . . 
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Query  has  been  written  to  file  "demo-4-out-2 . sql" 

Continue?  (Y  or  N):  y 

Converting  table  in  file  "demo-4-in-2.tbl"  for  use  as  response  to 
original  query 

Will  write  result  to  "demo-4-out .tbl"  ... 

Table  read  from  file  "demo-4-in-2 . tbl" ,  and  file  deleted 
Converted  table  has  been  written  to  file  "demo-4-out . tbl" 

This  completes  the  last  phase  of  the  demonstration,  so  we  exit  the 
mediator. 


C-c 


»Break:  Keyboard  interrupt 

LUCID :  '/.SLEEP : 

:C  0:  Return  from  Break 

->  (quit) 

1 
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